Online Book Reader

Home Category

Webbots, Spiders, and Screen Scrapers - Michael Schrenk [86]

By Root 321 0
local certificates to authenticate both client and server. In the vast majority of cases, however, there is no need for a local certificate—in fact, I have never been in a situation that required one. However, PHP/CURL supports local encryption certificates, and it's important to configure them even if you don't use them. Versions 7.10 and later of cURL assume that a local certificate is used and will not download any web page if the local certificate isn't defined.[65] This is counterintuitive since local certificates are seldom used; therefore, LIB_http—the library this book uses to fetch web pages and submit forms—assumes that there is no local encryption certificate and configures PHP/CURL accordingly, as shown in Listing 20-2.

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate

Listing 20-2: Telling PHP/CURL not to look for a local certificate

Later releases of cURL require this option even when no local certificate is used. For that reason, you should define this option every time you design a PHP/CURL interface.

If your webbot needs to run in a very secure network, a local certificate may be required to authenticate your webbot as a valid user of the web page or service it accesses. If you need to use a local encryption certificate, you can define one with the PHP/CURL options described in Listing 20-3.

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE); // Certificate in use

curl_setopt($ch, CURLOPT_CAINFO, $file_name); // Certificate file name

Listing 20-3: Telling PHP/CURL how to use a local encryption certificate

On even rarer occasions, you may have to support multiple local certificates. In those cases, you can define a directory path, instead of a filename, to tell cURL where to find the location of all your encryption certificates, as shown in Listing 20-4.

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE); // Certificate in use

curl_setopt($ch, CURLOPT_CAPATH, $path); // Directory where multiple

// certificates are stored

Listing 20-4: Telling PHP/CURL how to use multiple local encryption certificates

* * *

[65] I learned this lesson the hard way when a client flew me to Palo Alto for a week to work on a project. None of my PHP/CURL routines worked on the client's server because it used a later version of cURL than I was using. After a few embarrassing moments, I discovered that the problem involved defining local certificates, even when they aren't used.

Final Thoughts

Occasionally, you can force an encrypted website into transferring unencrypted data by simply changing the protocol from https to http in the request. While this may allow you to download the web page, this technique is a bad idea because, in addition to potentially revealing confidential data, your webbot's actions will look unusual in server log files, which will destroy all attempts at stealth.

Sometimes web developers use the wrong protocol when designing web forms. It's important to remember that the default protocol for form submission is http, and unless specifically defined as https by the form's action attribute, the form is submitted without encryption, even if the form exists on a secure web page! Using the wrong network protocol is a common mistake made by inexperienced web developers. For that reason, when your webbot submits a form, you need to be sure it uses the same form-submission protocol that is defined by the downloaded form. For example, if you download an encrypted form page and the form's action attribute isn't defined, the protocol is http, not https! As wrong as it sounds, you need to use the same protocol defined by the web form, even if it is not the proper protocol to use in that specific case. If your webbot uses a protocol that is different than the one browsers use when submitting the form, you may cause the system administrator to scratch his or her head and investigate why one web client isn't using the same protocol everyone else is using.

Chapter 21. AUTHENTICATION

If your webbots are going to access sensitive information or handle money, they'll need

Return Main Page Previous Page Next Page

®Online Book Reader