Squid_ The Definitive Guide - Duane Wessels [37]
Table 6-1 shows Squid's rules for matching domain and hostnames. The first column shows hostnames taken from requested URLs (or client hostnames for srcdomain ACLs). The second column indicates whether or not the hostname matches lrrr.org. The third column shows whether the hostname matches an .lrrr.org ACL. As you can see, the only difference is in the second case.
Table 6-1. Domain name matching
URL hostname
Matches ACL lrrr.org?
Matches ACL .lrrr.org?
lrrr.org
Yes
Yes
i.am.lrrr.org
No
Yes
iamlrrr.org
No
No
Domain name matching can be confusing, so let's look at another example so that you really understand it. Here are two slightly different ACLs:
acl A dstdomain foo.com
acl B dstdomain .foo.com
A user's request to get http://www.foo.com/ matches ACL B, but not A. ACL A requires an exact string match, but the leading dot in ACL B is like a wildcard.
On the other hand, a user's request to get http://foo.com/ matches both ACLs A and B. Even though there is no word before foo.com in the URL hostname, the leading dot in ACL B still causes a match.
Squid uses splay trees to store domain name ACLs, just as it does for IP addresses. However, Squid's domain name matching algorithm presents an interesting problem for splay trees. The splay tree technique requires that only one key can match any particular search term. For example, let's say the search term (from a URL) is i.am.lrrr.org. This hostname would be a match for both .lrrr.org and .am.lrrr.org. The fact that two ACL values match one hostname confuses the splay algorithm. In other words, it is a mistake to put something like this in your configuration file:
acl Foo dstdomain .lrrr.org .am.lrrr.org
If you do, Squid generates the following warning message:
WARNING: '.am.lrrr.org' is a subdomain of '.lrrr.org'
WARNING: because of this '.am.lrrr.org' is ignored to keep splay tree searching predictable
WARNING: You should probably remove '.am.lrrr.org' from the ACL named 'Foo'
You should follow Squid's advice in this case. Remove one of the related domains so that Squid does exactly what you intend. Note that you can use both domain names as long as you put them in different ACLs:
acl Foo dstdomain .lrrr.org
acl Bar dstdomain .am.lrrr.org
This is allowed because each named ACL uses its own splay tree.
Usernames
Used by: ident, proxy_auth
ACLs of this type are designed to match usernames. Squid may learn a username through the RFC 1413 ident protocol or via HTTP authentication headers. Usernames must be matched exactly. For example, bob doesn't match bobby. Squid also has related ACLs (ident_regex and proxy_auth_regex) that use regular-expression pattern matching on usernames.
You can use the word REQUIRED as a special value to match any username. If Squid can't determine the username, the ACL isn't matched. This is how Squid is usually configured when using username-based access controls.
Regular expressions
Used by: srcdom_regex, dstdom_regex, url_regex, urlpath_regex, browser, referer_regex, ident_regex, proxy_auth_regex, req_mime_type, rep_mime_type
A number of ACLs use regular expressions (regex) to match character strings. (For a complete regular-expression reference, see O'Reilly's Mastering Regular Expressions.) For Squid, the most commonly used regex features match the beginning and/or end of a string. For example, the ^ character is special because it matches the beginning of a line or string:
^http://
This regex matches any URL that begins with http://. The $ character is also special because it matches the end of a line or string:
.jpg$
Actually, the previous example is slightly wrong because the . character is special too. It is a wildcard that matches any character. What we really want is this:
\.jpg$
The backslash escapes the . so that its specialness is taken away. This regex matches any string that ends with .jpg. If you don't use the ^ or $ characters, regular expressions behave like standard substring searches. They match an occurrence of the word (or words) anywhere in