Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [123]

By Root 406 0
handle) to reference an external file. And in both cases, when the file transfer is complete, the session is closed. However, PHP/CURL differs from standard file I/O because it requires a series of options that define the nature of the file transfer set before the exchange takes place. These options are set individually, in any order. When many options are required, the list of settings can be long and confusing. For simplicity, Listing A-1 shows the minimal options required to create a PHP/CURL session that will put a downloaded file into a variable.

# Open a PHP/CURL session

$s = curl_init();

# Configure the cURL command

curl_setopt($s, CURLOPT_URL, "http://www.schrenk.com"); // Define target site

curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE); // Return in string

# Execute the cURL command (send contents of target web page to string)

$downloaded_page = curl_exec($s);

# Close PHP/CURL session

curl_close($s);

?>

Listing A-1: A minimal PHP/CURL session

The rest of this section details how to initiate sessions, set options, execute commands, and close sessions in PHP/CURL. We'll also look at how PHP/CURL provides transfer status and error messages.

* * *

[93] See http://us2.php.net/manual/en/ref.curl.php.

Initiating PHP/CURL Sessions

Before you use cURL, you must initiate a session with the curl_init() function. Initialization creates a session variable, which identifies configurations and data belonging to a specific session. Notice how the session variable $s, created in Listing A-1, is used to configure, execute, and close the entire PHP/CURL session. Once you create a session, you may use it as many times as you need to.

Setting PHP/CURL Options

The PHP/CURL session is configured with the curl_setopt() function. Each individual configuration option is set with a separate call to this function. The script in Listing A-1 is unusual in its brevity. In normal use, there are many calls to curl_setopt(). There are over 90 separate configuration options available within PHP/CURL, making the interface very versatile.[94] The average PHP/CURL user, however, uses only a small subset of the available options. The following sections describe the PHP/CURL options you are most apt to use. While these options are listed here in order of relative importance, you may declare them in any order. If the session is left open, the configu-ration may be reused many times within the same session.

CURLOPT_URL

Use the CURLOPT_URL option to define the target URL for your PHP/CURL session, as shown in Listing A-2.

curl_setopt($s, CURLOPT_URL, "http://www.schrenk.com/index.php");

Listing A-2: Defining the target URL

You should use a fully formed URL describing the protocol, domain, and file in every PHP/CURL file request.

CURLOPT_RETURNTRANSFER

The CURLOPT_RETURNTRANSFER option must be set to TRUE, as in Listing A-3, if you want the result to be returned in a string. If you don't set this option to TRUE, PHP/CURL echoes the result to the terminal.

curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE); // Return in string

Listing A-3: Telling PHP/CURL that you want the result to be returned in a string

CURLOPT_REFERER

The CURLOPT_REFERER option allows your webbot to spoof a hyper-reference that was clicked to initiate the request for the target file. The example in Listing A-4 tells the target server that someone clicked a link on http://www.a_domain.com/index.php to request the target web page.

curl_setopt($s, CURLOPT_REFERER, "http://www.a_domain.com/index.php");

Listing A-4: Spoofing a hyper-reference

CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS

The CURLOPT_FOLLOWLOCATION option tells cURL that you want it to follow every page redirection it finds. It's important to understand that PHP/CURL only honors header redirections and not redirections set with a refresh meta tag or with JavaScript, as shown in Listing A-5.

# Example of redirection that cURL will follow

header("Location: http://www.schrenk.com");

?>

®Online Book Reader