Squid_ The Definitive Guide - Duane Wessels [8]
A cache hit occurs each time Squid satisfies an HTTP request from its cache. The cache hit ratio, or cache hit rate, is the percentage of all requests satisfied as hits. Web caches typically achieve hit ratios between 30% and 60%. A similar metric, the byte hit ratio, represents the volume of data (i.e., number of bytes) served from the cache.
A cache miss occurs when Squid can't satisfy a request from the cache. A miss can happen for any number of reasons. Obviously, the first time Squid receives a request for a particular resource, it is a cache miss. Similarly, Squid may have purged the cached copy to make room for new objects.
Another possibility is that the resource is uncachable. Origin servers can instruct caches on how to treat the response. For example, they can say that the data must never be cached, can be reused only within a certain amount of time, and so on. Squid also uses a few internal heuristics to determine what should, or should not, be saved for future use.
Cache validation is a process that ensures Squid doesn't serve stale data to the user. Before reusing a cached response, Squid often validates it with the origin server. If the server indicates that Squid's copy is still valid, the data is sent from Squid. Otherwise, Squid updates its cached copy as it relays the response to the client. Squid generally performs validation using timestamps. The origin server's response usually contains a last-modified timestamp. Squid sends the timestamp back to the origin server to find if the original resource has changed.
For a detailed treatment of web caching, have a look at my book Web Caching, also by O'Reilly.
* * *
[1] Gopher servers are quite rare these days. Squid also knows about WAIS and whois, but these are even more obscure.
A Brief History of Squid
In the beginning was the CERN HTTP server. In addition to functioning as an HTTP server, it was also the first caching proxy. The caching module was written by Ari Luotonen in 1994.
That same year, the Internet Research Task Force Group on Resource Discovery (IRTF-RD) started the Harvest project. It was "an integrated set of tools to gather, extract, organize, search, cache, and replicate" Internet information. I joined the Harvest project near the end of 1994. While most people used Harvest as a local (or distributed) search engine, the Object Cache component was quite popular as well. The Harvest cache boasted three major improvements over the CERN cache: faster use of the filesystem, a single process design, and caching hierarchies via the Internet Cache Protocol.
Towards the end of 1995, many Harvest team members made the move to the exciting world of Internet-based startup companies. The original authors of the Harvest cache code, Peter Danzig and Anawat Chankhunthod, turned it into a commercial product. Their company was later acquired by Network Appliance. In early 1996, I joined the National Laboratory for Applied Network Research (NLANR) to work on the Information Resource Caching (IRCache) project, funded by the National Science Foundation. Under this project, we took the Harvest cache code, renamed it Squid, and released it under the GNU General Public License.
Since that time Squid has grown in size and features. It now supports a number of cool things such as URL redirection, traffic shaping, sophisticated access controls, numerous authentication modules, advanced disk storage options, HTTP interception, and