Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [111]

By Root 378 0

$size[1]="S";

$price[1]=19.95;

Listing 26-9: Data sample available at http://www.schrenk.com/nostarch/webbots/26_2.php

The webbot receiving this data could convert this string directly into variables with PHP's eval() function, as shown in Listing 26-10.

# Include libraries

include("LIB_http.php");

$url = "http://www.schrenk.com/nostarch/webbots/26_2.php";

$download = http_get($url, "");

# Convert string received into variables

eval($download['FILE']);

# Show imported variables and values

for($xx=0; $xxecho "BRAND=".$brand[$xx]."

COLOR=".$color[$xx]."

SIZE=".$size[$xx]."

PRICE=".$price[$xx]."


";

Listing 26-10: Incorrectly interpreting variable/value pairs

While this seems very efficient, there is a severe security problem associated with this technique. The eval() function, which interprets the variable settings in Listing 26-10, is also capable of interpreting any PHP command. This opens the door for malicious code that can run directly on your webbot!

A Safer Method of Passing Variables to Webbots

An improvement on the previous example would verify that only data variables are interpreted by the webbot. We can accomplish this by slightly modifying the variable/value pairs sent to the webbot (shown in Listing 26-11) and adjusting how the webbot processes the data (shown in Listing 26-12). Listing 26-11 shows a new lightweight test interface that will deliver information directly in variables for use by a webbot.

brand[0]="Gordon LLC";

style[0]="Cotton T";

color[0]="red";

size[0]="XXL";

price[0]=19.95;

brand[1]="Ava LLC";

style[0]="Girlie T";

color[1]="blue";

size[1]="S";

price[1]=19.95;

Listing 26-11: Data sample used by the script in Listing 26-12

The script in Listing 26-12 shows how the lightweight interface in Listing 26-11 is interpreted.

# Get http library

include("LIB_http.php");

# Define and download lightweight test interface

$url = "http://www.schrenk.com/nostarch/webbots/26_3.php";

$download = http_get($url, "");

# Convert the received lines into array elements

$raw_vars_array = explode(";", $download['FILE']);

# Convert each of the array elements into a variable declaration

for($xx=0; $xx{

list($variable, $value)=explode("=", $raw_vars_array[$xx]);

$eval_string="$".trim($variable)."="."\"".trim($value)."\"".";";

eval($eval_string);

}

# Echo imported variables

for($xx=0; $xx{

echo "BRAND=".$brand[$xx]."

COLOR=".$color[$xx]."

SIZE=".$size[$xx]."

PRICE=".$price[$xx]."


";

}

Listing 26-12: A safe method for directly transferring values from a website to a webbot

The technique shown in Figure 26-12 safely imports the variable/data pairs from Listing 26-11 because the eval() command is explicitly directed to only set a variable to a value and not to execute arbitrary code.

This lightweight interface actually has another advantage over XML, in that the data does not have to appear in any particular order. For example, if you rearranged the data in Listing 26-11, the webbot would still interpret it correctly. The same could not be said for the XML data. And while the protocol is slightly less platform independent than XML, most computer programs are still capable of interpreting the data, as done in the example PHP script in Listing 26-12.

SOAP

No discussion of machine-readable interfaces is complete without mentioning the Simple Object Access Protocol (SOAP). SOAP is designed to pass instructions and data between specific types of web pages (known as web services) and scripts run by webbots, webservers, or desktop applications. SOAP is the successor of earlier protocols that make remote application calls, like Remote Procedure Call (RPC), Distributed Component Object Model (DCOM), and Common Object Request Broker Architecture (CORBA).

SOAP is a web protocol that uses HTTP and XML as the primary protocols for passing data between computers. In addition, SOAP also provides a layer (or two) of abstraction between the functions

Return Main Page Previous Page Next Page

®Online Book Reader