Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [47]

By Root 333 0
things, saves bandwidth on large networks by caching frequently downloaded images.[30] Squid, along with most other proxies, also converts private network IP addresses into a single public address through a process called Network Address Translation (NAT).

A side effect of proxy use is that proxies create a potentially anonymous browsing environment because individual network addresses are pooled into a single network address. Since only the proxied network address is visible to web servers, the identities of the individual surfers remain unknown. Anonymity is the focus of this chapter, but before we start that discussion, a quick review of the liabilities of browsing in a non-proxied environment is in order.

Non-proxied Environments

In non-proxied network environments, web clients are totally exposed to the servers they access. This is important in terms of privacy because servers maintain records of requesting IP addresses, the files accessed, and the times they were accessed, as depicted in Figure 10-1.

Figure 10-1. Browsing in a non-proxied network environment

Additionally, webservers may store small records of browsing activity on clients' hard drives in the form of cookies.[31] By reading cookies on a user's successive visits to the same Internet domain, webservers determine a variety of information, including previously defined browsing preferences, authentication criteria, and browsing history for that user within that domain.

Your Online Exposure

You may think that you only expose your identity online when you formally register a username and password with a website, or that your identity is only known at sites where you've registered. However, a variety of tricks are available to monitor Internet activity, even when you don't have administration rights to a website. For example, you can learn a lot about the users of community forums, news servers, or even MySpace by uploading a single-pixel image, usually a transparent GIF file, to one of those services. While the single-pixel image is essentially invisible, everybody who accesses a web page containing one also downloads this seemingly innocuous little image. If things are set up correctly, each web surfer who downloads a page containing one of these single-pixel images leaves a record in a server log file, unknowingly recording his or her IP address and file access time. Here are some of the things you can learn from these log files:

IP addresses of the web surfers accessing the page

Frequency that someone with a specific IP address (or domain of origin) visits the page

Time of day that web surfer visited the web page

Total traffic the web page receives

Indications of when traffic to the web page is heavy or light

Once you have a visitor's IP address, you could also identify his or her ISP by performing a reverse DNS lookup, which converts an IP address into its domain of origin. Many times, a reverse DNS lookup only reveals someone's ISP, like EarthLink or AOL. And since so many people (from all over the world) use these ISPs, that information isn't very useful. Other times, however, the domain name will give you the name of a specific corporation or organization that downloaded the web page.[32]

You can also configure the server that hosts the single-pixel image to write a cookie on the hard drive of the web surfer. With this cookie, you can determine when an individual user gains access to web pages. If you place your single-pixel image on many web pages that are visited by a specific Internet user, you can track much of that user's browsing activity.

If you think these threats to one's privacy are too theoretical, consider what happens on a larger scale with online advertising companies like MySpace, Google, DoubleClick, and SpecificClick. Given the large number of web pages on which these companies' advertisements appear, they are capable of tracking a very large percentage of your online activity. Just consider how many of the websites you visit have advertisements. Then look at your browser's cookie records (usually available in the

Return Main Page Previous Page Next Page

®Online Book Reader