Previous: Publishing Databases on the Web Contents and Introduction Next: Configuration

Working with the reverse proxy servers

The web pages and other data served both on the Internet (EUROPA) and the Intranet (IntraComm) are made available via reverse proxy servers. In other words, we are using proxy servers as stand-ins for the content servers. Netscape describes this function as follows:
When a client makes a request to the web site, the request goes to the proxy server. The proxy server then sends the client's request [...] to the content server. The content server passes the result [...] back to the proxy. The proxy sends the retrieved information to the client as if the proxy were the actual content server.
In rfc2616, on HTTP/1.1, a 'gateway', what we call a 'reverse proxy server', is defined as follows:
A server which acts as an intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway.
The following figure shows how the EUROPA reverse proxy works:
a reverse proxy server appears to be the real content server

The overhead generated by reverse proxy servers can be justified as follows:

Please note, that the reverse proxy servers EUROPA, IntraComm are only used as "stand-ins" for content servers that are physically located at the Data Centre. We do not make content servers located elsewhere available through these reverse proxy servers because we cannot guarantee that these foreign servers are available at any given time.

The use of reverse proxies does generate some overhead when it comes to creating pages and CGI scripts for our sites. We must therefore insist that you:

use relative links

Allow me to repeat this, because there are quite a few people out there that didn't seem to get this message:

USE RELATIVE LINKS

A link should never contain a host name or an IP address, and most certainly not a locally defined machine name or a test server. There is only one reason to include a host name or an IP address in a link: when you point to another web site.
When you need to point back to the home page of the current site you might use something like: <a href="/">home</a>.

Try as much as possible to make all links relative to the address of the current page. This will make it possible to move your set of pages to somewhere else in the case of a reorganization of the web site. Very often sites are developed with one location in mind but at the time they are made available to the public this location might have changed.
Another reason for insisting on the strict application of relative links is that your document is published, or might be published in the future, on two or more of our sites. Our web sites have different structures which makes it impossible in most cases to use the same URL path for your document on every site.

There are three ways to reference a page "/some/path/page.htm" from a page "/some/page.htm":

A hint for developers: the environment variable "HTTP_FORWARDED" will give you a string which tells you which proxy server(s) has (have) forwarded the current request. This could be something like: "by http://www.cc.cec:80 (Netscape-Proxy/3.52)".
When accessing via https://intracomm.cec.eu.int you would see: "by http://intracomm.cec.eu.int:443 (iPlanet-Web-Proxy-Server/3.6), by http://www.cc.cec:80 (Netscape-Proxy/3.52)".
Use this variable instead of "HTTP_HOST" or "SERVER_NAME" because these variables will contain the hostname:port number of the web server, which in most cases will not be directly accessible to the public. This can be used in case you need to generate a HTML "base" tag. For an example check out the demo script used in the chapter on CGI scripts and programs.

"home" content server versus other content servers

Every one of our web sites consists at least of a reverse proxy server and a "home" content server. The "home" content server contains mostly static web pages, like the home page of the site, ColdFusion pages, and eventual CGI scripts.
There are a few things that will be different depending on where your pages and CGI scripts will be located.

The reverse proxy servers are programmed to map requested URLs by default to the corresponding "home" content server.

Most applications that generate web pages dynamically, and that are not using the ColdFusion server that is attached to the "home" content server, will be located on content servers other than the "home" content server. As a rule all the references within an application will have the same "root". In both proxy servers we only defined URL mappings to map all requests starting with "/idea" to "http://[databasehost:port]/idea", and back.
Here again, we must insist on the use of relative links. Very often applications are developed with one URL path in mind but in the end, for one reason or the other, a different URL path is chosen. So, in order to save yourself a lot of future trouble, and/or extra expenses, make sure that the web application or web site that you are about to develop uses relative links. (See preceding chapter)

This problem with coordinating URLs and mappings does not exist for CFM pages that are located on the "home" content server because these pages are located right next to the related HTML pages and images. We therefore recommend, where possible, to use ColdFusion applications to access Oracle DBs, rather than creating an application that is to run behind a separate web server.

Data Centre