Squid_ The Definitive Guide - Duane Wessels [47]
How Squid Matches Access Control Elements
It is important to understand how Squid searches ACL elements for a match. When an ACL element has more than one value, any single value can cause a match. In other words, Squid uses OR logic when checking ACL element values. Squid stops searching when it finds the first value that causes a match. This means that you can reduce delays by placing likely matches at the beginning of a list.
Let's look at a specific example. Consider this ACL definition:
acl Simpsons ident Maggie Lisa Bart Marge Homer
When Squid encounters the Simpsons ACL in an access list, it performs the ident lookup. Let's see what happens when the user's ident server returns Marge. Squid's ACL code compares this value to Maggie, Lisa, and Bart before finding a match with Marge. At this point, the search terminates, and we say that the Simpsons ACL matches the request.
Actually, that's a bit of a lie. The ident ACL values aren't stored as an unordered list. Rather, they are stored as an splay tree. This means that Squid doesn't end up searching all the names in the event of a nonmatch. Searching an splay tree with N items requires log(N) comparisons. Many other ACL types use splay trees as well. The regular expression-based types, however, don't.
Since regular expressions can't be sorted, they are stored as linked lists. This makes them inefficient for large lists, especially for requests that don't match any of the regular expressions in the list. In an attempt to improve this situation, Squid moves a regular expression to the top of the list when a match occurs. In fact, due to the nature of the ACL matching code, Squid moves matched entries to the second position in the list. Thus, commonly matched values naturally migrate to the top of the ACL list, which should reduce the number of comparisons.
Let's look at another simple example:
acl Schmever port 80-90 101 103 107 1 2 3 9999
This ACL is a match for a request to an origin server port between 80 and 90, and all the other individual listed port numbers. For a request to port 80, Squid matches the ACL by looking at the first value. For port 9999, all the other values are checked first. For a port number not listed, Squid checks every value before declaring the ACL isn't a match. As I've said before, you can optimize the ACL matching by placing the more common values first.
* * *
[1] CIDR stands for Classless Inter-Domain Routing. It is from an Internet-wide effort to support routing by any prefix length, instead of the old class A, B, and C subnet lengths.
[2] Apart from access controls, Squid only needs an origin server's IP address when establishing a connection to that server. DNS lookups normally occur much later in request processing. If the HTTP request results in a cache hit, Squid doesn't need to know the server's address. Additionally, Squid doesn't need IP addresses for cache misses that are forwarded to a neighbor cache.
[3] For the RFC database, visit http://www.rfc-editor.org/rfc.html.
Access Control Rules
As I mentioned earlier, ACL elements are the first step in building access controls. The second step is the access control rules, where you combine elements to allow or deny certain actions. You've already seen some http_access rules in the preceding examples. Squid has a number of other access control lists:
http_access
This is your most important access list. It determines which client HTTP requests are allowed, and which are denied. If you get the http_access configuration wrong, your Squid cache may be vulnerable to attacks and abuse from people who shouldn't have access to it.
http_reply_access
The http_reply_access list is similar to http_access. The difference is that the former list is checked when Squid receives a reply from an origin server or upstream proxy. Most access controls are based on aspects of the client's request, in which case the http_access list is sufficient. However, some people prefer also to allow or deny requests based on