Webbots, Spiders, and Screen Scrapers - Michael Schrenk [63]
function get_nntp_article_ids($server, $newsgroup)
{
# Open socket connection to the mail server
$socket = fsockopen($server, $port="119", $errno, $errstr, 30);
if (!$socket)
{
# If socket error, issue error
$return_array['ERROR'] = "ERROR: $errstr ($errno)";
}
else
{
# Else tell server which group to connect to
fputs($socket, "GROUP ".$newsgroup." \r\n");
$return_array['GROUP_MESSAGE'] = trim(fread($socket, 2000));
# Get the range of available articles for this group
fputs($socket, "NEXT \r\n");
$res = fread($socket, 2000);
$array = explode(" ", $res);
$return_array['RESPONSE_CODE'] = $array[0];
$return_array['EST_QTY_ARTICLES'] = $array[1];
$return_array['FIRST_ARTICLE'] = $array[2];
$return_array['LAST_ARTICLE'] = $array[3];
}
fputs($socket, "QUIT \r\n");
fclose($socket);
return $return_array;
}
Listing 14-5: The function get_nntp_article_ids()
Reading an Article from a Newsgroup
Once you know the range of valid article identifiers for your newsgroup (on your news sever), you can request an individual article. For example, the script in Listing 14-6 reads article number 562340 from the group alt.vacation.las-vegas.
include("LIB_nntp.php");
$server = "your.news.server";
$newsgroup = "alt.vacation.las-vegas";
$article = read_nntp_article($server, $newsgroup, $article=562340);
echo $article['HEAD'];
echo $article['ARTICLE'];
Listing 14-6: Reading and displaying an article from a news server
When you execute the code in Listing 14-6, you'll see a screen similar to the one in Figure 14-4. On my news server, article 562340 is the same article displayed in the screenshot of the Thunderbird news reader, shown earlier in Figure 14-1.[47]
Figure 14-4. Reading a newsgroup article
The first part of Figure 14-4 shows the NTTP header, which, like a mail or HTTP header, returns status information about the article. Following the header is the article. Notice that in the header and at the beginning of the article, it is also referred to as The function called to read the news article is shown in Listing 14-7. function read_nntp_article($server, $newsgroup, $article) { # Open socket connection to the mail server $socket = fsockopen($server, $port="119", $errno, $errstr, 30); if (!$socket) { # If socket error, issue error $return_array['ERROR'] = "ERROR: $errstr ($errno)"; } else { # Else tell server which group to connect to fputs($socket, "GROUP ".$newsgroup." \r\n"); # Request this article's HEAD fputs($socket, "HEAD $article \r\n"); $return_array['HEAD'] = read_nntp_buffer($socket); # Request the article fputs($socket, "BODY $article \r\n"); $return_array['ARTICLE'] = read_nntp_buffer($socket); } fputs($socket, "QUIT \r\n"); // Sign out (newsgroup server) fclose($socket); // Close socket return $return_array; // Return data array } Listing 14-7: A function that reads a newsgroup article As mentioned earlier, NNTP was designed for use on older (slower) networks. For this reason, the article headers are available separately from the actual articles. This allowed news readers to download article headers first, to show users which articles were available on their news servers. If an article interested the viewer, that article alone was downloaded, consuming minimum bandwidth. * * * [46] There is a full list of NNTP status codes in Appendix B. [47] Remember that article IDs are unique to newsgroups on each specific news server. Your article IDs