Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [28]

By Root 331 0
page into a browser. Once you fill out the form (by hand) and submit it, the form analyzer will provide an analysis similar to the one in Figure 5-3.

This simple diagnosis isn't perfect—use it at your own risk. However, it does allow a webbot developer to verify the form method, agent name, and GET and POST variables as they are presented to the actual form handler. For example, in this particular exercise, it is evident that the form handler expects a POST method with the variables sessionid, email, message, status, gender, and vol.

Forms with a session ID point out the importance of downloading and analyzing the form before emulating it. In this typical case, the session ID is assigned by the server and cannot be predicted. The webbot can accurately use session IDs only by first downloading and parsing the web page containing the form.

Figure 5-3. Using a form analyzer

If you were to write a script that emulates the form submitted and analyzed in Figure 5-3, it would look something like Listing 5-9.

include("LIB_http.php");

# Initiate addresses

$action="http://www.schrenk.com/nostarch/webbots/form_analyzer.php";

$ref = "" ;

# Set submission method

$method="POST";

# Set form data and values

$data_array['sessionid'] = "sdfg73453845";

$data_array['email'] = "sales@schrenk.com";

$data_array['message'] = "This is a test message";

$data_array['status'] = "in school";

$data_array['gender'] = "M";

$data_array['vol'] = "on";

$response = http($target=$action, $ref, $method, $data_array, EXCL_HEAD);

Listing 5-9: Using LIB_http to emulate the form analysis in Figure 5-3

After you write a form-emulation script, it's a good idea to use the analyzer to verify that the form method and variables match the original form you are attempting to emulate. If you're feeling ambitious, you could improve on this simple form analyzer by designing one that accepts both the submitted and emulated forms and compares them for you.

The script in Listing 5-10 is similar to the one running at http://www .schrenk.com/nostarch/webbots/form_analyzer.php. This script is for reference only. You can download the latest copy from this book's website. Note that the PHP sections of this script appear in bold.

setcookie("SET BY THIS PAGE", "This is a diagnostic cookie.");

?>

HTTP Request Diagnostic Page

Webbot Diagnostic Page

This web page is a tool to diagnose webbot functionality by

examining what the webbot sends to webservers.

VariableValue sent to server
HTTP Request Method
Your IP Address
Server Port
Refererif(isset($_SERVER['HTTP_REFERER']))

echo $_SERVER['HTTP_REFERER'];

else

echo "Null
";

?>

Agent Nameif(isset($_SERVER['HTTP_USER_AGENT']))

echo $_SERVER['HTTP_USER_AGENT'];

else

echo "Null
";

?>

Get Variablesif(count($_GET)>0)

var_dump($_GET);

else

echo "Null";

?>

Post Variablesif(count($_POST)>0)

var_dump($_POST);

else

echo "Null";

?>

Cookiesif(count($_COOKIE)>0)

var_dump($_COOKIE);

else

echo "Null";

?>

This web page also sets a diagnostic

Return Main Page Previous Page Next Page

®Online Book Reader