Apache Security - Ivan Ristic [138]
Management node clusters
A different approach to solving the DNSRR node failure problem is to introduce a central management node to the cluster (Figure 9-8). In this configuration, cluster nodes are given private addresses. The system as a whole has only one IP address, which is assigned to the management node. The management node will do the following:
Monitor cluster nodes for failure
Measure utilization of cluster nodes
Distribute incoming requests
Figure 9-8. Classic load balancing architecture
To avoid a central point of failure, the management node itself is clustered, usually in a failover mode with an identical copy of itself (though you can use a DNSRR solution with an IP address for each management node).
This is a classic high-availability/load-balancing architecture. Distribution is often performed on the TCP/IP level so the cluster can work for any protocol, including HTTP (though all solutions offer various HTTP extensions). It is easy, well understood, and widely deployed. The management nodes are usually off-the-shelf products, often quite expensive but quite capable, too. These products include:
Foundry Networks ServerIron (http://www.foundrynet.com/products/webswitches/serveriron/)
F5 Networks BigIP (http://www.f5.com/f5products/bigip/)
Cisco LocalDirector (http://www.cisco.com/warp/public/cc/pd/cxsr/400/)
An open source alternative for Linux is the Linux Virtual Server project (http://www.linuxvirtualserver.org). It provides tools to create a high availability cluster (or management node) out of cheap commodity hardware.
* * *
Session Affinity
The management node cluster distributes load on a per-request basis. Since HTTP is a stateless protocol, you could have several requests served by different cluster nodes. This can create a problem for applications not designed to work in a cluster and, thus, they keep session state on individual nodes. The term session affinity describes a cluster that always sends a user to the same cluster node. The terms sticky sessions or server affinity are often used as synonyms for session affinity.
Session affinity is especially important (for performance reasons) when SSL is used. To take advantage of SSLv3 sessions (which can be quickly resumed, as discussed in Chapter 4), consecutive user requests must arrive at the same cluster node.
An alternative to having a session-aware cluster is to deploy an application that conforms to one of the following:
Does not keep state
Keeps state on the client (cookies)
Keeps the state in a central location (usually a database)
Replicates state across cluster nodes
* * *
Reverse proxy clusters
Reverse proxy clusters are the same in principle as management node clusters except that they work on the HTTP level and, therefore, only for the HTTP protocol. This type of proxy is of great interest to us because it is the only architecture that allows HTTP firewalling. Commercial solutions that work as proxies are available, but here we will discuss an open source solution based around Apache.
Ralf S. Engelschall, the man behind mod_rewrite, was the first to describe how reverse proxy load balancing can be achieved using mod_rewrite:
"Website Balancing, Practical approaches to distributing HTTP traffic" by Ralf S. Engelschall (http://www.webtechniques.com/archives/1998/05/engelschall/)
First, create a script that will create a list of available cluster nodes and store it in a file servers.txt:
# a list of servers to load balance
www www1|www2|www3|www4
The script should be executed every few minutes to regenerate the list. Then configure mod_rewrite to use the list to redirect incoming requests through the internal proxy:
RewriteMap servers rnd:/usr/local/apache/conf/servers.txt
RewriteRule ^/(.+)$ ${servers:www} [P,L]
In this configuration, mod_rewrite is smart enough to detect when the file servers.txt changes and