Webbots, Spiders, and Screen Scrapers - Michael Schrenk [125]
$header_array[] = "Mime-Version: 1.0";
$header_array[] = "Content-type: text/html; charset=iso-8859-1";
$header_array[] = "Accept-Encoding: compress, gzip";
curl_setopt($curl_session, CURLOPT_HTTPHEADER, $header_array);
Listing A-11: Configuring an outgoing header
CURLOPT_SSL_VERIFYPEER
You only need to use this option if the target website uses SSL encryption and the protocol in CURLOPT_URL is https:. An example is shown in Listing A-12.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
Listing A-12: Configuring PHP/CURL not to use a local client certificate
Depending on the version of PHP/CURL you use, this option may be required; if you don't use it, the target server will attempt to download a client certificate, which is unnecessary in all but rare cases.
CURLOPT_USERPWD and CURLOPT_UNRESTRICTED_AUTH
As shown in Listing A-13, you may use the CURLOPT_USERPWD option with a valid username and password to access websites that use basic authentication. In contrast to using a browser, you will have to submit the username and password to every page accessed within the basic authentication realm.
curl_setopt($s, CURLOPT_USERPWD, "username:password");
curl_setopt($s, CURLOPT_UNRESTICTED_AUTH, TRUE);
Listing A-13: Configuring PHP/CURL for basic authentication schemes
If you use this option in conjunction with CURLOPT_FOLLOWLOCATION, you should also use the CURLOPT_UNRESTRICTED_AUTH option, which will ensure that the username and password are sent to all pages you're redirected to, providing they are part of the same realm.
Exercise caution with using CURLOPT_USERPWD, as it is possible that you can inadvertently send username and password information to the wrong server, where it may appear in access log files.
CURLOPT_POST and CURLOPT_POSTFIELDS
The CURLOPT_POST and CURLOPT_POSTFIELDS options configure PHP/CURL to emulate forms with the POST method. Since the default method is GET, you must first tell PHP/CURL to use the POST method. Then you must specify the POST data that you want to be sent to the target webserver. An example is shown in Listing A-14.
curl_setopt($s, CURLOPT_POST, TRUE); // Use POST method
$post_data = "var1=1&var2=2&var3=3"; // Define POST data values
curl_setopt($s, CURLOPT_POSTFIELDS, $post_data);
Listing A-14: Configuring POST method transfers
Notice that the POST data looks like a standard query string sent in a GET method. Incidentally, to send form information with the GET method, simply attach the query string to the target URL.
CURLOPT_VERBOSE
The CURLOPT_VERBOSE option controls the quantity of status messages created during a file transfer. You may find this helpful during debugging, but it is best to turn off this option during the production phase, because it produces many entries in your server log file. A typical succession of log messages for a single file download looks like Listing A-15.
* About to connect() to www.schrenk.com port 80
* Connected to www.schrenk.com (66.179.150.101) port 80
* Connection #0 left intact
* Closing connection #0
Listing A-15: Typical messages from a verbose PHP/CURL session
If you're in verbose mode on a busy server, you'll create very large log files. Listing A-16 shows how to turn off verbose mode.
curl_setopt($s, CURLOPT_VERBOSE, FALSE); // Minimal logs
Listing A-16: Turning off verbose mode reduces the size of server log files.
CURLOPT_PORT
By default, PHP/CURL uses port 80 for all HTTP sessions, unless you are connecting to an SSL encrypted server, in which case port 443 is used.[95] These are the standard port numbers for HTTP and HTTPS protocols, respectively. If you're connecting to a custom protocol or wish to connect to a non-web protocol, use CURLOPT_PORT to set the desired port number, as shown in Listing A-17.
curl_setopt($s, CURLOPT_PORT, 234); // Use port number 234
Listing A-17: Using nonstandard communication ports
Note
Configuration settings must be capitalized, as shown in the previous examples. This is