Webbots, Spiders, and Screen Scrapers - Michael Schrenk [1]
Further Exploration
Final Thoughts
9. LINK-VERIFICATION WEBBOTS
Creating the Link-Verification Webbot
Initializing the Webbot and Downloading the Target
Setting the Page Base
Parsing the Links
Running a Verification Loop
Generating Fully Resolved URLs
Downloading the Linked Page
Displaying the Page Status
Running the Webbot
LIB_http_codes
LIB_resolve_addresses
Further Exploration
10. ANONYMOUS BROWSING WEBBOTS
Anonymity with Proxies
Non-proxied Environments
Your Online Exposure
Proxied Environments
The Anonymizer Project
Writing the Anonymizer
Final Thoughts
11. SEARCH-RANKING WEBBOTS
Description of a Search Result Page
What the Search-Ranking Webbot Does
Running the Search-Ranking Webbot
How the Search-Ranking Webbot Works
The Search-Ranking Webbot Script
Initializing Variables
Starting the Loop
Fetching the Search Results
Parsing the Search Results
Final Thoughts
Be Kind to Your Sources
Search Sites May Treat Webbots Differently Than Browsers
Spidering Search Engines Is a Bad Idea
Familiarize Yourself with the Google API
Further Exploration
12. AGGREGATION WEBBOTS
Choosing Data Sources for Webbots
Example Aggregation Webbot
Familiarizing Yourself with RSS Feeds
Writing the Aggregation Webbot
Adding Filtering to Your Aggregation Webbot
Further Exploration
13. FTP WEBBOTS
Example FTP Webbot
PHP and FTP
Further Exploration
14. NNTP NEWS WEBBOTS
NNTP Use and History
Webbots and Newsgroups
Identifying News Servers
Identifying Newsgroups
Finding Articles in Newsgroups
Reading an Article from a Newsgroup
Further Exploration
15. WEBBOTS THAT READ EMAIL
The POP3 Protocol
Logging into a POP3 Mail Server
Reading Mail from a POP3 Mail Server
Executing POP3 Commands with a Webbot
Further Exploration
Email-Controlled Webbots
Email Interfaces
16. WEBBOTS THAT SEND EMAIL
Email, Webbots, and Spam
Sending Mail with SMTP and PHP
Configuring PHP to Send Mail
Sending an Email with mail()
Writing a Webbot That Sends Email Notifications
Keeping Legitimate Mail out of Spam Filters
Sending HTML-Formatted Email
Further Exploration
Using Returned Emails to Prune Access Lists
Using Email as Notification That Your Webbot Ran
Leveraging Wireless Technologies
Writing Webbots That Send Text Messages
17. CONVERTING A WEBSITE INTO A FUNCTION
Writing a Function Interface
Defining the Interface
Analyzing the Target Web Page
Using describe_zipcode()
Final Thoughts
Distributing Resources
Using Standard Interfaces
Designing a Custom Lightweight "Web Service"
III. ADVANCED TECHNICAL CONSIDERATIONS
18. SPIDERS
How Spiders Work
Example Spider
LIB_simple_spider
harvest_links()
archive_links()
get_domain()
exclude_link()
Experimenting with the Spider
Adding the Payload
Further Exploration
Save Links in a Database
Separate the Harvest and Payload
Distribute Tasks Across Multiple Computers
Regulate Page Requests
19. PROCUREMENT WEBBOTS AND SNIPERS
Procurement Webbot Theory
Get Purchase Criteria
Authenticate Buyer
Verify Item
Evaluate Purchase Triggers
Make Purchase
Evaluate Results
Sniper Theory
Get Purchase Criteria
Authenticate Buyer
Verify Item
Synchronize Clocks
Time to Bid?
Submit Bid
Evaluate Results
Testing Your Own Webbots and Snipers
Further Exploration
Final Thoughts
20. WEBBOTS AND CRYPTOGRAPHY
Designing Webbots That Use Encryption
SSL and PHP Built-in Functions
Encryption and PHP/CURL
A Quick Overview of Web Encryption
Local Certificates
Final Thoughts
21. AUTHENTICATION
What Is Authentication?
Types of Online Authentication