Squid_ The Definitive Guide - Duane Wessels [97]
Compared to ICP, Cache Digests trade time for space. Whereas ICP queries incur time penalties (latency), digests incur space (memory, disk) penalties. In Squid, a neighbor's digest is stored entirely in memory. A typical digest requires about 625 KB of memory for every million objects.
The Bloom filter is an interesting data structure that provides lossy encoding of a collection of items. The filter itself is simply a large array of bits. Given a Bloom filter (and the parameters used to generate it), you can find, with some uncertainty, if a particular item is in the collection. In Squid, items are URIs, and the digest is sized at 5 bits per cached object. For example, you can represent the collection of 1,000,000 cached objects with a filter of 5,000,000 bits, or 625,000 bytes.
Due to their nature, Bloom filters aren't a perfect representation of the collection. They sometimes incorrectly indicate that a particular item is present in the collection because two or more items may turn on the same bit. In other words, the filter can indicate that object X is in the cache, even though X was never cached or requested. These false positives occur with a certain probability you can control by adjusting the parameters of the filter. For example, increasing the number of bits per object decreases the false positive probability. See my O'Reilly book, Web Caching, for many more details about Cache Digests.
Configuring Squid for Cache Digests
First of all, you must compile Squid with the Cache Digest code enabled. Simply add the —enable-cache-digests option when running ./configure. Taking this step causes two things to happen when you run Squid:
Your Squid cache generates a digest of its own contents. Your neighbors may request this digest if they are also configured to use Cache Digests.
Your Squid requests a Cache Digest from each of its neighbors.
If you don't want to request digests for a particular neighbor, use the no-digest option on the cache_peer line. For example:
cache_peer neighbor.host.name parent 3128 3130 no-digest
Squid stores its own digest under the following URL: http://my.host.name:port/squid-internal-periodic/store_digest. When Squid requests a neighbor's digest, it simply requests http://neighbor.host.name:port/squid-internal-periodic/store_digest. Obviously, this naming scheme is specific to Squid. If you have a non-Squid neighbor that supports Cache Digests, you may need to tell your Squid that the neighbor's digest has a different address. The digest-url=url option to cache_peer allows you to configure the URL for the neighbor's Cache Digest. For example:
cache_peer neighbor.host.name parent 3128 3130 digest-url=http://blah/digest
squid.conf has a number of directives that control the way in which Squid generates its own Cache Digest. First, the digest_generation directive controls whether or not Squid generates a digest of its cache. You might want to disable digest generation if your cache is a child to a parent, but not a parent or sibling to any other caches. The remaining directives control low-level underlying details of digest generation. You should change them only if you fully understand the Cache Digest implementation.
The digest_bits_per_entry determines the size of the digest. The default value is 5. Increasing the value results in larger digests (consuming more memory and bandwidth) and lower false-hit probabilities. A lower setting results in smaller digests and more false hits. I feel that the default setting is a very nice tradeoff. A setting of 3 or lower has too many false hits to be useful, and a setting of 8 or higher simply wastes bandwidth.
Squid uses a two-step process to create a cache digest. First, it builds the cache digest data structure. This is basically a large Bloom filter and small header that contains the digest parameters. Once the data structure is filled, Squid creates a cached HTTP response for the