Apache Security - Ivan Ristic [132]
This is where the second directive comes in. It instructs the proxy server to observe response headers, modify them to hide the internal information, and respond to its clients with responses that make sense to them.
Another way to use the reverse proxy is through mod_rewrite. The following would have the same effect as the ProxyPass directive above. Note the use of the P (proxy throughput) and L (last rewrite directive) flags.
RewriteRule ^(.+)$ http://web.internal.com/$1 [P,L]
mod_proxy_html
At this point, one problem remains: applications often generate and embed absolute links into HTML pages. But unlike the response header problem that gets handled by Apache, absolute links in pages are left unmodified. Again, this reveals the real name of the internal server to its clients. This problem cannot be solved with standard Apache but with the help of a third-party module, mod_proxy_html, which is maintained by Nick Kew. It can be downloaded from http://apache.webthing.com/mod_proxy_html/. It requires libxml2, which can be found at http://xmlsoft.org. (Note: the author warns against using libxml2 versions lower than 2.5.10.)
To compile the module, I had to pass the compiler the path to libxml2:
# apxs -Wc,-I/usr/include/libxml2 -cia mod_proxy_html.c
For the same reason, in the httpd.conf configuration file, you have to load the libxml2 dynamic library before attempting to load the mod_proxy_html module:
LoadFile /usr/lib/libxml2.so
LoadModule proxy_html_module modules/mod_proxy_html.so
The module looks into every HTML page, searches for absolute links referencing the internal server, and replaces them with links referencing the proxy. To activate this behavior, add the following to the configuration file:
# activate mod_proxy_html
SetOutputFilter proxy-html
# prevent content compression in backend operation
RequestHeader unset Accept-Encoding
# replace references to the internal server
# with references to this proxy
ProxyHTMLURLMap http://web.internal.com/ /
You may be wondering about the directive to prevent compression. If the client supports content decompression, it will state that with an appropriate Accept-Encoding header:
Accept-Encoding: gzip,deflate
If that happens, the backend server will respond with a compressed response, but mod_proxy_html does not know how to handle compressed content and it fails to do its job. By removing the header from the request, we force plaintext communication between the reverse proxy and the backend server. This is not a problem. Chances are both servers will share a fast local network where compression would not work to enhance performance.
Read Nick's excellent article published in Apache Week, in which he gives more tips and tricks for reverse proxying:
"Running a Reverse Proxy With Apache" by Nick Kew (http://www.apacheweek.com/features/reverseproxies)
There is an unavoidable performance penalty when using mod_proxy_html. To avoid unnecessary slow down, only activate this module when a problem with absolute links needs to be solved.
Reverse Proxy by Network Design
The most common approach to running a reverse proxy is to design it into the network. The web server is assigned a private IP address (e.g., 192.168.0.1) instead of a real one. The reverse proxy gets a real IP address (e.g., 217.160.182.153), and this address is attached to the domain name (which is www.example.com in the following example). Configuring Apache to respond to a domain name by forwarding requests to another server is trivial:
ProxyPass / http://192.168.0.1/