Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [62]

By Root 376 0
40 seconds to download them all, so expect a short delay when requesting large amounts of data.) For each group, the server responds with the name of the group, the identifier of the first article, the identifier of the last article, and a y if you can post articles to this group or an n if posting articles to this group (on this server) is prohibited.

News servers terminate messages by sending a line that contains just a period (.), which you can see in the last array element in Figure 14-2. That lone period is the only sign your webbot will receive to tell it to stop looking for data. If your webbot reads buffers incorrectly, it will either hang indefinitely or return with incomplete data. The small function shown in Listing 14-2 (found in LIB_nntp) correctly reads data from an open NNTP network socket and recognizes the end-of-message indicator.

function read_nntp_buffer($socket)

{

$this_line ="";

$buffer ="";

while($this_line!=".\r\n") // Read until lone . found on line

{

$this_line = fgets($socket); // Read line from socket

$buffer = $buffer . $this_line;

}

return $buffer;

}

Listing 14-2: Reading NNTP data and identifying the end of messages

The script in Listing 14-1 uses the function get_nntp_groups() to get an array of available groups hosted by your news server. The script for that function is shown below in Listing 14-3.

function get_nntp_groups($server)

{

# Open socket connection to the mail server

$fp = fsockopen($server, $port="119", $errno, $errstr, 30);

if (!$fp)

{

# If socket error, issue error

$return_array['ERROR'] = "ERROR: $errstr ($errno)";

}

else

{

# Else tell server to return a list of hosted newsgroups

$out = "LIST\r\n";

fputs($fp, $out);

$groups = read_nntp_buffer($fp);

$groups_array = explode("\r\n", $groups); // Convert to an array

}

fputs($fp, "QUIT \r\n"); // Log out

fclose($fp); // Close socket

return $groups_array;

}

Listing 14-3: A function that finds available newsgroups on a news server

As you'll learn, all NNTP commands follow a structure similar to the one used in Listing 14-3. Most NNTP commands require that you do the following:

Connect to the server (on port 119)

Issue a command, like LIST (followed by a carriage return/line feed)

Read the results (until encountering a line with a lone perioid)

End the session with a QUIT command

Close the network socket

Other NNTP commands that identify groups hosted by news servers are listed in RFC 997. You can use the basic structure of get_nntp_groups() as a guide to creating other functions that execute NNTP commands found in RFC 997.

Finding Articles in Newsgroups

As you read earlier, newsgroup articles are distributed among each of the news servers hosting a particular newsgroup and are physically located at each server hosting the newsgroup. Each article has a sequential numeric identifier that identifies the article on a particular news server. You may request the range of numeric identifiers for articles (for a given a newsgroup) with a script similar to the one in Listing 14-4.

include("LIB_nntp.php");

# Request article IDs

$server = "your.news.server";

$newsgroup = "alt.vacation.las-vegas";

$ids_array = get_nntp_article_ids($server, $newsgroup);

# Report Results

echo "\nInfo about articles in $newsgroup on $server\n";

echo "Code: ". $ids_array['RESPONSE_CODE']."\n";

echo "Estimated # of articles: ". $ids_array['EST_QTY_ARTICLES']."\n";

echo "First article ID: ". $ids_array['FIRST_ARTICLE']."\n";

echo "Last article ID: ". $ids_array['LAST_ARTICLE']."\n";

Listing 14-4: Requesting article IDs from a news server

The result of running the script in Listing 14-4 is shown in Figure 14-3.

Figure 14-3. Executing get_nntp_article_ids() and displaying the results

This function returns data in an array, with elements containing a status code,[46] the estimated quantity of articles for that group on the server, the identifier of the first article in the newsgroup, and the identifier of the last article in the newsgroup. An estimate of the number of articles is provided because

Return Main Page Previous Page Next Page

®Online Book Reader