Webbots, Spiders, and Screen Scrapers - Michael Schrenk [65]
LIST
+OK
1 2398
2 2023
.
Listing 15-4: Results of a POP3 LIST command
The server's reply to the LIST command tells us that there are two messages on the server for the specified account. We can also tell that message 1 is the larger message, at 2,398 bytes, and that message 2 is 2,023 bytes in length. Beyond that, we don't know anything specific about any of these messages.
The last line in the response is the end of message indicator. Servers always terminate POP3 responses with a line containing only a period.
The POP3 RETR Command
To read a specific message, enter RETR followed by a space and the mail ID received from the LIST command. The command in Listing 15-5 requests message 1.
RETR 1
Listing 15-5: Requesting a message from the server
The mail server should respond to the RETR command with a string of characters resembling the contents of Listing 15-6.
+OK 2398 octets
Return-Path: Delivered-To: me@server.com Received: (qmail 73301 invoked from network); 19 Feb 2006 20:55:31 −0000 Received: from mail2.server.net by mail1.server.net (qmail-ldap-1.03) with compressed QMQP; 19 Feb 2006 20:55:31 −0000 Delivered-To: CLUSTERHOST mail2.server.net me@server.com Received: (qmail 50923 invoked from network); 19 Feb 2006 20:55:31 −0000 Received: by simscan 1.1.0 ppid: 50907, pid: 50912, t: 2.8647s scanners: attach: 1.1.0 clamav: 0.86.1/m:34/d:1107 spam: 3.0.4 Received: from web30515.mail.mud.server.com (envelope-sender by mail2.server.net (qmail-ldap-1.03) with SMTP for Received: (qmail 7734 invoked by uid 60001); 19 Feb 2006 20:55:26 −0000 Message-ID: <20060219205526.7732.qmail@web30515.mail.mud.server.com> Date: Sun, 19 Feb 2006 12:55:26 −0800 (PST) From: mike schrenk Subject: Hey, Can you read this email? To: mike schrenk MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-349883719-1140382526=:7581" Content-Transfer-Encoding: 8bit X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail2.server.com X-Spam-Level: X-Spam-Status: No, score=0.9 required=17.0 tests=HTML_00_10,HTML_MESSAGE, HTML_SHORT_LENGTH autolearn=no version=3.0.4 --0-349883719-1140382526=:7581 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit This is an email sent from my Yahoo! email account. --0-349883719-1140382526=:7581 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit This is an email sent from my Yahoo! email account. . Listing 15-6: A raw email message read from the server using the RETR POP3 command As you can see, even a short email message has a lot of overhead. Most of the returned information has little to do with the actual text of a message. For example, the email message retrieved in Listing 15-6 doesn't appear until over halfway down the listing. The rest of the text returned by the mail server consists of headers, which tell the mail client the path the message took, which services touched it (like SpamAssassin), how to display or handle the message, to whom to send replies, and so forth. These headers include some familiar information such as the subject header, the to and from values, and the MIME version. You can easily parse this information with the return_between() function found in the LIB_parse library (see Chapter 4), as shown in Listing 15-7. $ret_path = return_between($raw_message, "Return-Path: ", "\n", EXCL ); $deliver_to = return_between($raw_message, "Delivered-To: ", "\n", EXCL ); $date = return_between($raw_message, "Date: ", "\n", EXCL ); $from = return_between($raw_message, "From: ", "\n", EXCL ); $subject = return_between($raw_message, "Subject: ", "\n", EXCL ); Listing 15-7: Parsing header values The header values in Listing 15-7 are separated by their names and a \n (carriage return) character. Note
--0-349883719-1140382526=:7581--