Apache Security - Ivan Ristic [161]
Sometimes, rarely, you may encounter an application that performs URL decoding twice. This is not correct behavior according to standards, but it does happen. In this case, an attacker could perform URL encoding twice.
The URL:
http://www.example.com/paynow.php?p=attack
becomes:
http://www.example.com/paynow.php?p=%61%74%74%61%63%6B
when encoded once (since %61 is an encoded a character, %74 is an encoded t character, and so on), but:
http://www.example.com/paynow.php?p=%2561%2574%2574%2561%2563%256B
when encoded twice (where %25 represents a percent sign).
If you have an IDS watching for the word "attack", it will (rightly) decode the URL only once and fail to detect the word. But the word will reach the application that decodes the data twice.
There is another way to exploit badly written decoding schemes. As you know, a character is URL-encoded when it is represented with a percentage sign, followed by two hexadecimal digits (0-F, representing the values 0-15). However, some decoding functions never check to see if the two characters following the percentage sign are valid hexadecimal digits. Here is what a C function for handling the two digits might look like:
unsigned char x2c(unsigned char *what) {
unsigned char c0 = toupper(what[0]);
unsigned char c1 = toupper(what[1]);
unsigned char digit;
digit = ( c0 >= 'A' ? c0 - 'A' + 10 : c0 - '0' );
digit = digit * 16;
digit = digit + ( c1 >= 'A' ? c1 - 'A' + 10 : c1 - '0' );
return digit;
}
This code does not do any validation. It will correctly decode valid URL-encoded characters, but what happens when an invalid combination is supplied? By using higher characters than normally allowed, we could smuggle a slash character, for example, without an IDS noticing. To do so, we would specify XV for the characters since the above algorithm would convert those characters to the ASCII character code for a slash.
The URL:
http://www.example.com/paynow.php?p=/etc/passwd
would therefore be represented by:
http://www.example.com/paynow.php?p=%XVetc%XVpasswd
Unicode Encoding
Unicode attacks can be effective against applications that understand it. Unicode is the international standard whose goal is to represent every character needed by every written human language as a single integer number (see http://en.wikipedia.org/wiki/Unicode). What is known as Unicode evasion should more correctly be referenced as UTF-8 evasion. Unicode characters are normally represented with two bytes, but this is impractical in real life. First, there are large amounts of legacy documents that need to be handled. Second, in many cases only a small number of Unicode characters are needed in a document, so using two bytes per character would be wasteful.
* * *
Tip
Internet Information Server (IIS) supports a special (nonstandard) way of representing Unicode characters, designed to resemble URL encoding. If a letter "u" comes after the percentage sign, then the four bytes that follow are taken to represent a full Unicode character. This feature has been used in many attacks carried out against IIS servers. You will need to pay attention to this type of attack if you are maintaining an Apache-based reverse proxy to protect IIS servers.
* * *
UTF-8, a transformation format of ISO 10646 (http://www.ietf.org/rfc/rfc2279.txt) allows most files to stay as they are and still be Unicode compatible. Until a special byte sequence is encountered, each byte represents a character from the Latin-1 character set. When a special byte sequence is used, two or more (up to six) bytes can be combined to form a single complex Unicode character.
One aspect of UTF-8 encoding causes problems: non-Unicode characters can be represented encoded. What is worse is multiple representations of each character