Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [63]

By Root 303 0
some articles are deleted after submission, so not every article within the given range is actually available. It's also worth noting that each server will have its own rules for when articles become obsolete, so each server will have a different number of articles for any one newsgroup. The code that actually reads the article identifiers from the server is shown in Listing 14-5.

function get_nntp_article_ids($server, $newsgroup)

{

# Open socket connection to the mail server

$socket = fsockopen($server, $port="119", $errno, $errstr, 30);

if (!$socket)

{

# If socket error, issue error

$return_array['ERROR'] = "ERROR: $errstr ($errno)";

}

else

{

# Else tell server which group to connect to

fputs($socket, "GROUP ".$newsgroup." \r\n");

$return_array['GROUP_MESSAGE'] = trim(fread($socket, 2000));

# Get the range of available articles for this group

fputs($socket, "NEXT \r\n");

$res = fread($socket, 2000);

$array = explode(" ", $res);

$return_array['RESPONSE_CODE'] = $array[0];

$return_array['EST_QTY_ARTICLES'] = $array[1];

$return_array['FIRST_ARTICLE'] = $array[2];

$return_array['LAST_ARTICLE'] = $array[3];

}

fputs($socket, "QUIT \r\n");

fclose($socket);

return $return_array;

}

Listing 14-5: The function get_nntp_article_ids()

Reading an Article from a Newsgroup

Once you know the range of valid article identifiers for your newsgroup (on your news sever), you can request an individual article. For example, the script in Listing 14-6 reads article number 562340 from the group alt.vacation.las-vegas.

include("LIB_nntp.php");

$server = "your.news.server";

$newsgroup = "alt.vacation.las-vegas";

$article = read_nntp_article($server, $newsgroup, $article=562340);

echo $article['HEAD'];

echo $article['ARTICLE'];

Listing 14-6: Reading and displaying an article from a news server

When you execute the code in Listing 14-6, you'll see a screen similar to the one in Figure 14-4. On my news server, article 562340 is the same article displayed in the screenshot of the Thunderbird news reader, shown earlier in Figure 14-1.[47]

Figure 14-4. Reading a newsgroup article

The first part of Figure 14-4 shows the NTTP header, which, like a mail or HTTP header, returns status information about the article. Following the header is the article. Notice that in the header and at the beginning of the article, it is also referred to as . Unlike the server-dependent identifier used in the previous function call, this longer identifier is universal and references this article on any news server that hosts this newsgroup.

The function called to read the news article is shown in Listing 14-7.

function read_nntp_article($server, $newsgroup, $article)

{

# Open socket connection to the mail server

$socket = fsockopen($server, $port="119", $errno, $errstr, 30);

if (!$socket)

{

# If socket error, issue error

$return_array['ERROR'] = "ERROR: $errstr ($errno)";

}

else

{

# Else tell server which group to connect to

fputs($socket, "GROUP ".$newsgroup." \r\n");

# Request this article's HEAD

fputs($socket, "HEAD $article \r\n");

$return_array['HEAD'] = read_nntp_buffer($socket);

# Request the article

fputs($socket, "BODY $article \r\n");

$return_array['ARTICLE'] = read_nntp_buffer($socket);

}

fputs($socket, "QUIT \r\n"); // Sign out (newsgroup server)

fclose($socket); // Close socket

return $return_array; // Return data array

}

Listing 14-7: A function that reads a newsgroup article

As mentioned earlier, NNTP was designed for use on older (slower) networks. For this reason, the article headers are available separately from the actual articles. This allowed news readers to download article headers first, to show users which articles were available on their news servers. If an article interested the viewer, that article alone was downloaded, consuming minimum bandwidth.

* * *

[46] There is a full list of NNTP status codes in Appendix B.

[47] Remember that article IDs are unique to newsgroups on each specific news server. Your article IDs

Return Main Page Previous Page Next Page

®Online Book Reader