Learning Python - Mark Lutz [493]
Using 3.0 Bytes Objects
We studied a wide variety of operations available for Python 3.0’s general str string type in Chapter 7; the basic string type works identically in 2.6 and 3.0, so we won’t rehash this topic. Instead, let’s dig a bit deeper into the operation sets provided by the new bytes type in 3.0.
As mentioned previously, the 3.0 bytes object is a sequence of small integers, each of which is in the range 0 through 255, that happens to print as ASCII characters when displayed. It supports sequence operations and most of the same methods available on str objects (and present in 2.X’s str type). However, bytes does not support the format method or the % formatting expression, and you cannot mix and match bytes and str type objects without explicit conversions—you generally will use all str type objects and text files for text data, and all bytes type objects and binary files for binary data.
Method Calls
If you really want to see what attributes str has that bytes doesn’t, you can always check their dir built-in function results. The output can also tell you something about the expression operators they support (e.g., __mod__ and __rmod__ implement the % operator):
C:\misc> c:\python30\python
# Attributes unique to str
>>> set(dir('abc')) - set(dir(b'abc'))
{'isprintable', 'format', '__mod__', 'encode', 'isidentifier',
'_formatter_field_name_split', 'isnumeric', '__rmod__', 'isdecimal',
'_formatter_parser', 'maketrans'}
# Attributes unique to bytes
>>> set(dir(b'abc')) - set(dir('abc'))
{'decode', 'fromhex'}
As you can see, str and bytes have almost identical functionality. Their unique attributes are generally methods that don’t apply to the other; for instance, decode translates a raw bytes into its str representation, and encode translates a string into its raw bytes representation. Most of the methods are the same, though bytes methods require bytes arguments (again, 3.0 string types don’t mix). Also recall that bytes objects are immutable, just like str objects in both 2.6 and 3.0 (error messages here have been shortened for brevity):
>>> B = b'spam' # b'...' bytes literal
>>> B.find(b'pa')
1
>>> B.replace(b'pa', b'XY') # bytes methods expect bytes arguments
b'sXYm'
>>> B.split(b'pa')
[b's', b'm']
>>> B
b'spam'
>>> B[0] = 'x'
TypeError: 'bytes' object does not support item assignment
One notable difference is that string formatting works only on str objects in 3.0, not on bytes objects (see Chapter 7 for more on string formatting expressions and methods):
>>> b'%s' % 99
TypeError: unsupported operand type(s) for %: 'bytes' and 'int'
>>> '%s' % 99
'99'
>>> b'{0}'.format(99)
AttributeError: 'bytes' object has no attribute 'format'
>>> '{0}'.format(99)
'99'
Sequence Operations
Besides method calls, all the usual generic sequence operations you know (and possibly love) from Python 2.X strings and lists work as expected on both str and bytes in 3.0; this includes indexing, slicing, concatenation, and so on. Notice in the following that indexing a bytes object returns an integer giving the byte’s binary value; bytes really is a sequence of 8-bit integers, but it prints as a string of ASCII-coded characters when displayed as a whole for convenience. To check a given byte’s value, use the chr built-in to convert it back to its character, as in the following:
>>> B = b'spam' # A sequence of small ints
>>> B # Prints as ASCII characters
b'spam'
>>> B[0] # Indexing yields an int
115
>>> B[-1]
109
>>> chr(B[0]) # Show character for int
's'
>>> list(B) # Show all the byte's int values
[115, 112, 97, 109]
>>> B[1:], B[:-1]
(b'pam', b'spa')
>>> len(B)
4
>>> B + b'lmn'
b'spamlmn'
>>> B * 4
b'spamspamspamspam'
Other Ways to Make bytes Objects
So far, we’ve been mostly making bytes objects with the b'...' literal syntax; they can also be created by calling the bytes constructor with a str and an encoding name, calling the bytes constructor with an iterable of integers representing byte values,