Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [58]

By Root 344 0

"", "", EXCL);

}

}

}

return $rss_array;

}

Listing 12-8: Adding filtering to the download_parse_rss() function

Listing 12-8 is identical to Listing 12-4, with the following exceptions:

The filter array is passed to download_parse_rss()

Each news story is compared to every keyword

Only stories that contain a keyword are parsed and placed into $rss_array

The end result of the script in Listing 12-8 is an aggregator that only lists stories that contain material with the keywords in $filter_array. As configured, the comparison of stories and keywords is not case sensitive. If case sensitivity is required, simply replace stristr() with strstr(). Remember, however, that the amount of data returned is directly tied to the number of keywords and the frequency with which they appear in stories.

Further Exploration

The true power of webbots is that they can make decisions and take action with the information they find online. Here are a few suggestions for extending what you've learned to do with RSS or other data you choose to aggregate with your webbots.

Modify the script in Listing 12-8 to accept stories that don't contain a keyword.

Write an aggregation webbot that doesn't display information unless it finds it on two or more sources.

Design a webbot that looks for specific keywords in news stories and sends an email notification when those keywords appear.

Search blogs for spelling errors.

Find an RSS feed that posts scores from your favorite sports team. Parse and store the scores in a database for later statistical analysis.

Write a webbot that uses news stories to help you decided whether to buy or sell commodities futures.

Devise an online clipping service that archives information about your company.

Create an RSS feed for the example store used in Chapter 7.

Chapter 13. FTP WEBBOTS

File transfer protocol (FTP) is among the oldest Internet protocols.[41] It dates from the Internet's predecessor ARPANET, which was originally funded by the Eisenhower administration.[42] Research centers started using FTP to exchange large files in the early 1970s, and FTP became the de facto transport protocol for email, a status it maintained until the early 1990s. Today, system administrators most commonly use FTP to allow web developers to upload and maintain files on remote webservers. Though it's an older protocol, FTP still allows computers with dissimilar technologies to share files, independent of file structure and operating system.

Example FTP Webbot

To gain an insight for uses of an FTP-capable webbot, consider this scenario. A national retailer needs to move large sales reports from each of its stores to a centralized corporate webserver. This particular retail chain was built through acquisition, so it uses multiple protocols and proprietary computer systems. The one thing all of these systems have in common is access to an FTP server. The goal for this project is to use FTP protocols to download store sales reports and move them to the corporate server.

The script for this example project is available for study at this book's website. Just remember that the script satisfies a ficticious scenario and will not run unless you change the configuration. In this chapter, I have split it up and annotated the sections for clarity. Listing 13-1 shows the initialization for the FTP servers.

// Define the source FTP server, file location, and authentication values

define("REMOTE_FTP_SERVER", "remote_FTP_address"); // Domain name or IP address

define("REMOTE_USERNAME", "yourUserName");

define("REMOTE_PASSWORD", "yourPassword");

define("REMOTE_DIRCTORY", "daily_sales");

define("REMOTE_FILE", "sales.txt");

// Define the corporate FTP server, file location, and authentication values

define("CORP_FTP_SERVER", "corp_FTP_address");

define("CORP_USERNAME", "yourUserName");

define("CORP_PASSWORD", "yourPassword");

define("CORP_DIRCTORY", "sales_reports");

define("CORP_FILE", "store_03_".date("Y-M-d"));

Listing 13-1: Initializing the FTP bot

This program also configures

Return Main Page Previous Page Next Page

®Online Book Reader