Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [25]

By Root 369 0
we can go further, it's important that you understand the various parts of HTML forms.

Form Handlers, Data Fields, Methods, and Event Triggers

Web-based forms have four main parts, as shown in Figure 5-2:

A form handler

One or more data fields

A method

One or more event triggers

I'll examine each of these parts in detail and then show how a webbot emulates a form.

Figure 5-2. Parts of a form

Form Handlers

The action attribute in the

tag defines the web page that interprets the data entered into the form. We'll refer to this page as the form handler. If there is no defined action, the form handler is the same as the page that contains the form. The examples in Table 5-1 compare the location of form handlers in a variety of conditions.

Table 5-1. Variations in Form-Handler Descriptions

action Attribute

Meaning

name="myForm"

action="search.php"

>

The script called search.php will accept and interpret the form data. This script shares the same server and directory as the page that served the form.

name="myForm"

action="../cgi/search.php"

>

A script called search.php handles this form and is in the cgi directory, which is parallel to the current directory.

name="myForm"

action="/search.php"

>

The script called search.php, in the home directory of the server that served the page, handles this form.

name="myForm"

action="www.schrenk.com/search.php"

>

The contents of this form are sent to the specified page at http://www.schrenk.com.

There isn't an action (or form handler) specified in the tag. In these cases, the same page that delivered the form is also the page that interprets the completed form.

Servers have no use for the form's name, which is the variable that identifies the form. This variable is only used by JavaScript, which associates the form name with its form elements. Since servers don't use the form's name, webbots (and their designers) have no use for it either.

Data Fields

Form input tags define data fields and the name, value, and user interface used to input the value. The user interface (or widget) can be a text box, text area, select list, radio control, checkbox, or hidden element. Remember that while there are many types of interfaces, they are completely meaningless to the webbot that emulates the form and the server that handles the form. From a webbot's perspective, there is no difference between data entered via a text box or a select list. The input tag's name and its value are the only things that matter.

Every data field must have a name.[17] These names become form data variables, or containers for their data values. In Listing 5-1, a variable called session_id is set to 0001, and the value for search is whatever was in the text box labeled Search when the user clicked the submit button. Again, from a webbot designer's perspective, it doesn't matter what type of data elements define the data fields (hidden, select, radio, text box, etc.). It is important that the data has the correct name and that the value is within a range expected by the form handler.

Listing 5-1: Data fields in a HTML form

Methods

The form's method describes the protocol used to send the form data to the form handler. The most common methods for form data transfers are GET and POST.

The GET Method

You are already familiar with the GET method, because it is identical to the protocol you used to request web pages in previous chapters. With the GET protocol, the URL of a web page is combined with data from form elements. The address of the page and the data are separated by a ? character, and individual data variables are separated by & characters, as shown in Listing 5-2. The portion of the URL that follows the ? character is known as a query string.

URL http://www.schrenk.com/search.php?term=hello&sort=up

Listing 5-2: Data values

Return Main Page Previous Page Next Page

®Online Book Reader