Learning Python - Mark Lutz [494]
>>> B = b'abc'
>>> B
b'abc'
>>> B = bytes('abc', 'ascii')
>>> B
b'abc'
>>> ord('a')
97
>>> B = bytes([97, 98, 99])
>>> B
b'abc'
>>> B = 'spam'.encode() # Or bytes()
>>> B
b'spam'
>>>
>>> S = B.decode() # Or str()
>>> S
'spam'
From a larger perspective, the last two of these operations are really tools for converting between str and bytes, a topic introduced earlier and expanded upon in the next section.
Mixing String Types
In the replace call of the section Method Calls, we had to pass in two bytes objects—str types won’t work there. Although Python 2.X automatically converts str to and from unicode when possible (i.e., when the str is 7-bit ASCII text), Python 3.0 requires specific string types in some contexts and expects manual conversions if needed:
# Must pass expected types to function and method calls
>>> B = b'spam'
>>> B.replace('pa', 'XY')
TypeError: expected an object with the buffer interface
>>> B.replace(b'pa', b'XY')
b'sXYm'
>>> B = B'spam'
>>> B.replace(bytes('pa'), bytes('xy'))
TypeError: string argument without an encoding
>>> B.replace(bytes('pa', 'ascii'), bytes('xy', 'utf-8'))
b'sxym'
# Must convert manually in mixed-type expressions
>>> b'ab' + 'cd'
TypeError: can't concat bytes to str
>>> b'ab'.decode() + 'cd' # bytes to str
'abcd'
>>> b'ab' + 'cd'.encode() # str to bytes
b'abcd'
>>> b'ab' + bytes('cd', 'ascii') # str to bytes
b'abcd'
Although you can create bytes objects yourself to represent packed binary data, they can also be made automatically by reading files opened in binary mode, as we’ll see in more detail later in this chapter. First, though, we should introduce bytes’s very close, and mutable, cousin.
Using 3.0 (and 2.6) bytearray Objects
So far we’ve focused on str and bytes, since they subsume Python 2’s unicode and str. Python 3.0 has a third string type, though—bytearray, a mutable sequence of integers in the range 0 through 255, is essentially a mutable variant of bytes. As such, it supports the same string methods and sequence operations as bytes, as well as many of the mutable in-place-change operations supported by lists. The bytearray type is also available in Python 2.6 as a back-port from 3.0, but it does not enforce the strict text/binary distinction there that it does in 3.0.
Let’s take a quick tour. bytearray objects may be created by calling the bytearray built-in. In Python 2.6, any string may be used to initialize:
# Creation in 2.6: a mutable sequence of small (0..255) ints
>>> S = 'spam'
>>> C = bytearray(S) # A back-port from 3.0 in 2.6
>>> C # b'..' == '..' in 2.6 (str)
bytearray(b'spam')
In Python 3.0, an encoding name or byte string is required, because text and binary strings do not mix, though byte strings may reflect encoded Unicode text:
# Creation in 3.0: text/binary do not mix
>>> S = 'spam'
>>> C = bytearray(S)
TypeError: string argument without an encoding
>>> C = bytearray(S, 'latin1') # A content-specific type in 3.0
>>> C
bytearray(b'spam')
>>> B = b'spam' # b'..' != '..' in 3.0 (bytes/str)
>>> C = bytearray(B)
>>> C
bytearray(b'spam')
Once created, bytearray objects are sequences of small integers like bytes and are mutable like lists, though they require an integer for index assignments, not a string (all of the following is a continuation of this session and is run under Python 3.0 unless otherwise noted—see comments for 2.6 usage notes):
# Mutable, but must assign ints, not strings
>>> C[0]
115
>>> C[0] = 'x' # This and the next work in 2.6
TypeError: an integer is required
>>> C[0] = b'x'
TypeError: an integer is required
>>> C[0] = ord('x')
>>> C
bytearray(b'xpam')