Learning Python - Mark Lutz [488]
More formally, in 3.0 all the current string literal forms—'xxx', "xxx", and triple-quoted blocks—generate a str; adding a b or B just before any of them creates a bytes instead. This new b'...' bytes literal is similar in form to the r'...' raw string used to suppresses backslash escapes. Consider the following, run in 3.0:
C:\misc> c:\python30\python
>>> B = b'spam' # Make a bytes object (8-bit bytes)
>>> S = 'eggs' # Make a str object (Unicode characters, 8-bit or wider)
>>> type(B), type(S)
( >>> B # Prints as a character string, really sequence of ints b'spam' >>> S 'eggs' The bytes object is actually a sequence of short integers, though it prints its content as characters whenever possible: >>> B[0], S[0] # Indexing returns an int for bytes, str for str (115, 'e') >>> B[1:], S[1:] # Slicing makes another bytes or str object (b'pam', 'ggs') >>> list(B), list(S) ([115, 112, 97, 109], ['e', 'g', 'g', 's']) # bytes is really ints The bytes object is immutable, just like str (though bytearray, described later, is not); you cannot assign a str, bytes, or integer to an offset of a bytes object. The bytes prefix also works for any string literal form: >>> B[0] = 'x' # Both are immutable TypeError: 'bytes' object does not support item assignment >>> S[0] = 'x' TypeError: 'str' object does not support item assignment >>> B = B""" # bytes prefix works on single, double, triple quotes ... xxxx ... yyyy ... """ >>> B b'\nxxxx\nyyyy\n' As mentioned earlier, in Python 2.6 the b'xxx' literal is present for compatibility but is the same as 'xxx' and makes a str, and bytes is just a synonym for str; as you’ve seen, in 3.0 both of these address the distinct bytes type. Also note that the u'xxx' and U'xxx' Unicode string literal forms in 2.6 are gone in 3.0; use 'xxx' instead, since all strings are Unicode, even if they contain all ASCII characters (more on writing non-ASCII Unicode text in the section Coding Non-ASCII Text). Conversions Although Python 2.X allowed str and unicode type objects to be mixed freely (if the strings contained only 7-bit ASCII text), 3.0 draws a much sharper distinction—str and bytes type objects never mix automatically in expressions and never are converted to one another automatically when passed to functions. A function that expects an argument to be a str object won’t generally accept a bytes, and vice versa. Because of this, Python 3.0 basically requires that you commit to one type or the other, or perform manual, explicit conversions: str.encode() and bytes(S, encoding) translate a string to its raw bytes form and create a bytes from a str in the process. bytes.decode() and str(B, encoding) translate raw bytes into its string form and create a str from a bytes in the process. These encode and decode methods (as well as file objects, described in the next section) use either a default encoding for your platform or an explicitly passed-in encoding name. For example, in 3.0: >>> S = 'eggs' >>> S.encode() # str to bytes: encode text into raw bytes b'eggs' >>> bytes(S, encoding='ascii') # str to bytes, alternative b'eggs' >>> B = b'spam' >>> B.decode() # bytes to str: decode raw bytes into text 'spam' >>> str(B, encoding='ascii') # bytes to str, alternative 'spam' Two cautions here. First of all, your platform’s default encoding is available in the sys module, but the encoding argument to bytes is not optional, even though it is in str.encode (and bytes.decode). Second, although calls to str do not require the encoding argument like bytes does, leaving it off in str calls does not mean it defaults—instead, a str call without an encoding returns the bytes object’s print