Apache Security - Ivan Ristic [66]
The concept of cacheability is important if you are preparing for a period of increased traffic, but it also can and should be used as a general technique to lower bandwidth consumption. It is said that content is cacheable when it is accompanied by HTTP response headers that provide information about when the content was created and how long it will remain fresh. Making content cacheable results in browsers and proxies sending fewer requests because they do not bother checking for updates of the content they know is not stale, and this results in lower bandwidth usage.
By default, Apache will do a reasonable job of making static documents cacheable. After having received a static page or an image from the web server once, a browser makes subsequent requests for the same resource conditional. It essentially says, "Send me the resource identified by the URL if it has not changed since I last requested it." Instead of returning the status 200 (OK) with the resource attached, Apache returns 304 (Not Modified) with no body.
Problems can arise when content is served through applications that are not designed with cacheability in mind. Most application servers completely disable caching under the (valid) assumption that it is better for applications not to have responses cached. This does not work well for content-serving web sites.
A good thing to do would be to use a cacheability engine to test the cacheability of an application and then talk to programmers about enhancing the application by adding support for HTTP caching.
Detailed information about caching and cacheability is available at:
"Caching Tutorial for Web Authors and Webmasters" by Mark Nottingham (http://www.mnot.net/cache_docs/)
"Cacheability Engine" (http://www.mnot.net/cacheability/)
Real-Life Client Problems
Assume you have chosen to serve a maximum of one hundred requests at any given time. After performing a few tests from the local network, you may have seen that Apache serves the requests quickly, so you think you will never reach the maximum. There are some things to watch for in real life:
Slow clients
Measuring the speed of request serving from the local network can be deceptive. Real clients will come from various speeds, with many of them using slow modems. Apache will be ready to serve the request fast but clients will not be ready to receive. A 20-KB page, assuming the client uses a modem running at maximum speed without any other bottlenecks (a brave assumption), can take over six seconds to serve. During this period, one Apache process will not be able to do anything else.
Large files
Large files take longer to download than small files. If you make a set of large files available for download, you need to be aware that Apache will use one process for each file being downloaded. Worse than that, users can have special download software packages (known as download accelerators), which open multiple download requests for the same file. However, for most users, the bottleneck is their network connection, so these additional download requests have no impact on the download speed. Their network connection is already used up.
Keep-Alive functionality
Keep-Alive is an HTTP protocol feature that allows clients to remain connected to the server between requests. The idea is to avoid having to re-establish TCP/IP connections with every request. Most web site users are slow with their requests, so the time Apache waits, before realizing the next request is not coming, is time wasted. The timeout is set to 15 seconds by default, which means 15 seconds for one process to do nothing. You can keep this feature enabled until you reach the maximum capacity of the server. If that happens you can turn it off or reduce the timeout Apache uses to wait for the next request to come. Newer Apache versions are likely to be improved to allow