Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [64]

By Root 308 0
are apt to be different.

Further Exploration

Now that you know how to use webbots to interface with newsgroups, here is a list of ideas you can use to develop news bots for your own purposes.

Develop a newsgroup clipping service. This service could monitor numerous newsgroups for mention of specific keywords and either aggregate that information in a database or send email alerts when a keyword appears in a newsgroup.

Build a web-based newsgroup portal, similar to http://groups.google.com.

Create a webbot that gathers weather forecasts for Las Vegas from the National Weather Service website, and post this weather information for vacationers on alt.vacation.las-vegas.[48]

Monitor newsgroups for unauthorized use of intellectual property.

Create a database that archives a newsgroup.

Write a web-based newsgroup client that allows users to read newsgroups anonymously.

* * *

[48] Due to the ridiculous amounts of spam on newsgroups, scripts for posting articles on newsgroups were deliberately omitted from this chapter. However, between the scripts used as examples in this chapter and the original NNTP RFC, you should be able to figure out how to post articles to newsgroups on your own.

Chapter 15. WEBBOTS THAT READ EMAIL

When a webbot can read email, it's easier for it to communicate with the outside world.[49] Webbots capable of reading email can take instruction via email commands, share data with handheld devices like BlackBerries and Palm PDAs, and filter messages for content.

For example, if package-tracking information is sent to an email account that a webbot can access, the webbot can parse incoming email from the carrier to track delivery status. Such a webbot could also send email warnings when shipments are late, communicate shipping charges to your corporate accounting software, or create reports that analyze a company's use of overnight shipping.

The POP3 Protocol

Of the many protocols for reading email from mail servers, I selected Post Office Protocol 3 (POP3) for this task because of its simplicity and near-universal support among mail servers. POP3 instructions are also easy to perform in any Telnet or standard TCP/IP terminal program.[50] The ability to use Telnet to execute POP3 commands will provide an understanding of POP3 commands, which we will later convert into PHP routines that any webbot may execute.

Logging into a POP3 Mail Server

Listing 15-1 shows how to connect to a POP3 mail server though a Telnet client. Simply enter telnet, followed by the mail server name and the port number (which is always 110 for POP3). The mail server should reply with a message similar to the one in Listing 15-1.

telnet mail.server.net 110

+OK <9238.1142228@mail2.server.net>

Listing 15-1: Making a Telnet connection to a POP3 mail server

The reply shown in Listing 15-1 says that you've made a connection to the POP3 mail server and that it is waiting for its next command, which should be your attempt to log in. Listing 15-2 shows the process for logging in to a POP3 mail server.

user me@server.com

+OK

pass xxxxxxxx

+OK

Listing 15-2: Successful authentication to a POP3 mail server

When you try this, be sure to substitute your email account in place of me@server.com and the password associated with your account for xxxxxxxx.

If authentication fails, the mail server should return an authentication failure message, as shown in Listing 15-3.

-ERR authorization failed

Listing 15-3: POP3 authentication failure

Reading Mail from a POP3 Mail Server

Before you can download email messages from a POP3 mail server, you'll need to execute a LIST command. The mail server will then respond with the number of messages on the server.

The POP3 LIST Command

The LIST command will also reveal the size of the email messages and, more importantly, how to reference individual email messages on the server.

The response to the LIST command contains a line for every available message for the specified account. Each line consists of a sequential mail ID number, followed by the size of the message

Return Main Page Previous Page Next Page

®Online Book Reader