Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [1]

By Root 334 0

Further Exploration

Final Thoughts

9. LINK-VERIFICATION WEBBOTS

Creating the Link-Verification Webbot

Initializing the Webbot and Downloading the Target

Setting the Page Base

Parsing the Links

Running a Verification Loop

Generating Fully Resolved URLs

Downloading the Linked Page

Displaying the Page Status

Running the Webbot

LIB_http_codes

LIB_resolve_addresses

Further Exploration

10. ANONYMOUS BROWSING WEBBOTS

Anonymity with Proxies

Non-proxied Environments

Your Online Exposure

Proxied Environments

The Anonymizer Project

Writing the Anonymizer

Final Thoughts

11. SEARCH-RANKING WEBBOTS

Description of a Search Result Page

What the Search-Ranking Webbot Does

Running the Search-Ranking Webbot

How the Search-Ranking Webbot Works

The Search-Ranking Webbot Script

Initializing Variables

Starting the Loop

Fetching the Search Results

Parsing the Search Results

Final Thoughts

Be Kind to Your Sources

Search Sites May Treat Webbots Differently Than Browsers

Spidering Search Engines Is a Bad Idea

Familiarize Yourself with the Google API

Further Exploration

12. AGGREGATION WEBBOTS

Choosing Data Sources for Webbots

Example Aggregation Webbot

Familiarizing Yourself with RSS Feeds

Writing the Aggregation Webbot

Adding Filtering to Your Aggregation Webbot

Further Exploration

13. FTP WEBBOTS

Example FTP Webbot

PHP and FTP

Further Exploration

14. NNTP NEWS WEBBOTS

NNTP Use and History

Webbots and Newsgroups

Identifying News Servers

Identifying Newsgroups

Finding Articles in Newsgroups

Reading an Article from a Newsgroup

Further Exploration

15. WEBBOTS THAT READ EMAIL

The POP3 Protocol

Logging into a POP3 Mail Server

Reading Mail from a POP3 Mail Server

Executing POP3 Commands with a Webbot

Further Exploration

Email-Controlled Webbots

Email Interfaces

16. WEBBOTS THAT SEND EMAIL

Email, Webbots, and Spam

Sending Mail with SMTP and PHP

Configuring PHP to Send Mail

Sending an Email with mail()

Writing a Webbot That Sends Email Notifications

Keeping Legitimate Mail out of Spam Filters

Sending HTML-Formatted Email

Further Exploration

Using Returned Emails to Prune Access Lists

Using Email as Notification That Your Webbot Ran

Leveraging Wireless Technologies

Writing Webbots That Send Text Messages

17. CONVERTING A WEBSITE INTO A FUNCTION

Writing a Function Interface

Defining the Interface

Analyzing the Target Web Page

Using describe_zipcode()

Final Thoughts

Distributing Resources

Using Standard Interfaces

Designing a Custom Lightweight "Web Service"

III. ADVANCED TECHNICAL CONSIDERATIONS

18. SPIDERS

How Spiders Work

Example Spider

LIB_simple_spider

harvest_links()

archive_links()

get_domain()

exclude_link()

Experimenting with the Spider

Adding the Payload

Further Exploration

Save Links in a Database

Separate the Harvest and Payload

Distribute Tasks Across Multiple Computers

Regulate Page Requests

19. PROCUREMENT WEBBOTS AND SNIPERS

Procurement Webbot Theory

Get Purchase Criteria

Authenticate Buyer

Verify Item

Evaluate Purchase Triggers

Make Purchase

Evaluate Results

Sniper Theory

Get Purchase Criteria

Authenticate Buyer

Verify Item

Synchronize Clocks

Time to Bid?

Submit Bid

Evaluate Results

Testing Your Own Webbots and Snipers

Further Exploration

Final Thoughts

20. WEBBOTS AND CRYPTOGRAPHY

Designing Webbots That Use Encryption

SSL and PHP Built-in Functions

Encryption and PHP/CURL

A Quick Overview of Web Encryption

Local Certificates

Final Thoughts

21. AUTHENTICATION

What Is Authentication?

Types of Online Authentication

Return Main Page Previous Page Next Page

®Online Book Reader