Online Book Reader

Home Category

Learning Python - Mark Lutz [105]

By Root 1855 0
bytes of a string. For instance, here’s a five-character string that embeds two binary zero bytes (coded as octal escapes of one digit):

>>> s = 'a\0b\0c'

>>> s

'a\x00b\x00c'

>>> len(s)

5

In Python, the zero (null) byte does not terminate a string the way it typically does in C. Instead, Python keeps both the string’s length and text in memory. In fact, no character terminates a string in Python. Here’s a string that is all absolute binary escape codes—a binary 1 and 2 (coded in octal), followed by a binary 3 (coded in hexadecimal):

>>> s = '\001\002\x03'

>>> s

'\x01\x02\x03'

>>> len(s)

3

Notice that Python displays nonprintable characters in hex, regardless of how they were specified. You can freely combine absolute value escapes and the more symbolic escape types in Table 7-2. The following string contains the characters “spam”, a tab and newline, and an absolute zero value byte coded in hex:

>>> S = "s\tp\na\x00m"

>>> S

's\tp\na\x00m'

>>> len(S)

7

>>> print(S)

s p

a m

This becomes more important to know when you process binary data files in Python. Because their contents are represented as strings in your scripts, it’s OK to process binary files that contain any sorts of binary byte values (more on files in Chapter 9).[17]

Finally, as the last entry in Table 7-2 implies, if Python does not recognize the character after a \ as being a valid escape code, it simply keeps the backslash in the resulting string:

>>> x = "C:\py\code" # Keeps \ literally

>>> x

'C:\\py\\code'

>>> len(x)

10

Unless you’re able to commit all of Table 7-2 to memory, though, you probably shouldn’t rely on this behavior.[18] To code literal backslashes explicitly such that they are retained in your strings, double them up (\\ is an escape for one \) or use raw strings; the next section shows how.

Raw Strings Suppress Escapes

As we’ve seen, escape sequences are handy for embedding special byte codes within strings. Sometimes, though, the special treatment of backslashes for introducing escapes can lead to trouble. It’s surprisingly common, for instance, to see Python newcomers in classes trying to open a file with a filename argument that looks something like this:

myfile = open('C:\new\text.dat', 'w')

thinking that they will open a file called text.dat in the directory C:\new. The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file named C:(newline)ew(tab)ext.dat, with usually less than stellar results.

This is just the sort of thing that raw strings are useful for. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:

myfile = open(r'C:\new\text.dat', 'w')

Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:

myfile = open('C:\\new\\text.dat', 'w')

In fact, Python itself sometimes uses this doubling scheme when it prints strings with embedded backslashes:

>>> path = r'C:\new\text.dat'

>>> path # Show as Python code

'C:\\new\\text.dat'

>>> print(path) # User-friendly format

C:\new\text.dat

>>> len(path) # String length

15

As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, you’ll see that there really is just 1 character per backslash, for a total of 15.

Besides directory paths on Windows, raw strings are also commonly

Return Main Page Previous Page Next Page

®Online Book Reader