UNIX System Administration Handbook - Evi Nemeth [412]
The only way to deal with this growth is to use replication. Whether it’s on a national, regional, or site level, Internet content needs to be more readily available from a closer source as the Internet grows. It just doesn’t make sense to transmit the same popular web page from Australia across a very expensive link to North America millions of times each day. There should be a way to store this information once it’s been sent across the link once. Fortunately, there is.
One answer is the freely available Squid Internet Object Cache.3
This package is both a caching and a proxy server that runs under UNIX and supports several protocols, including HTTP, FTP, Gopher, and SSL.
Here’s how it works. Client web browsers (such as Netscape and Internet Explorer) contact the Squid server to request an object from the Internet. The Squid server then makes a request on the client’s behalf (or provides the object from its cache, as discussed in the following paragraph) and returns the result to the client. Proxy servers of this type are often used to enhance security or filter content.
In a proxy-based system, only one machine needs to have direct access to the Internet through the organization’s firewall. At organizations such as K-12 schools, a proxy server can also filter content so that inappropriate material doesn’t fall into the wrong hands. Many commercial and freely available proxy servers (some based on Squid, some not) are available today.
Proxy service is nice, but it’s the caching features of Squid that are really worth getting excited about. Squid not only caches information from local user requests, but it also allows a hierarchy of Squid servers to be constructed. Groups of Squid servers use the Internet Cache Protocol (ICP) to communicate information about what’s in their caches.
This feature allows administrators to build a system in which local users contact an on-site caching server to obtain content from the Internet. If another user at that site has already requested the same content, a copy can be returned at LAN speed (usually, 10 or 100 Mb/s). If the local Squid server doesn’t have it, perhaps it contacts the regional caching server. As in the local case, if anyone in the region has requested the object, it is served immediately. If not, perhaps the caching server for the country or continent can be contacted, and so on. Users perceive a performance improvement, so they are happy.
For many, Squid offers economic benefits. Because users tend to share web discoveries, significant duplication of external web requests can occur at a reasonably-sized site. One study has shown that running a caching server can reduce external bandwidth requirements by up to 40%. This extra efficiency can be a big win at sites that pay for usage by the minute or the megabyte.
Setting up Squid
Squid is easy to install and configure and runs on most modern UNIX architectures. Since Squid needs space to store its cache, you should run it on a dedicated machine that has a lot of free memory and disk space. A reasonable configuration would be a machine with 256 MB of RAM and 20 GB of disk space.
You can download a fresh copy of Squid from squid.nlanr.net. After unpacking the distribution, you run the configure script at the top of the tree. This script assumes that you wish to install the package in /usr/local/squid. If you prefer some other location, use the --prefix=dir option to configure.
After configure has completed, run make all and then make install. Next, localize the configuration file, /usr/local/squid/etc/squid.conf. See the QUICKSTART file in the distribution directory for a list of the changes you must make to the sample squid.conf file.
You must also run /usr/local/squid/bin/squid -z by hand to build and zero out the directory structure in which cached web pages will be stored. Finally, you can start the server by hand with the /usr/local/squid/bin/RunCache script; you will eventually want to call this script from your