Learning Python - Mark Lutz [103]
Triple-quoted block strings
S = r'\temp\spam'
Raw strings
S = b'spam'
Byte strings in 3.0 (Chapter 36)
S = u'spam'
Unicode strings in 2.6 only (Chapter 36)
S1 + S2
S * 3
Concatenate, repeat
S[i]
S[i:j]
len(S)
Index, slice, length
"a %s parrot" % kind
String formatting expression
"a {0} parrot".format(kind)
String formatting method in 2.6 and 3.0
S.find('pa')
S.rstrip()
S.replace('pa', 'xx')
S.split(',')
S.isdigit()
S.lower()
S.endswith('spam')
'spam'.join(strlist)
S.encode('latin-1')
String method calls: search,
remove whitespace,
replacement,
split on delimiter,
content test,
case conversion,
end test,
delimiter join,
Unicode encoding, etc.
for x in S: print(x)
'spam' in S
[c * 2 for c in S]
map(ord, S)
Iteration, membership
Beyond the core set of string tools in Table 7-1, Python also supports more advanced pattern-based string processing with the standard library’s re (regular expression) module, introduced in Chapter 4, and even higher-level text processing tools such as XML parsers, discussed briefly in Chapter 36. This book’s scope, though, is focused on the fundamentals represented by Table 7-1.
To cover the basics, this chapter begins with an overview of string literal forms and string expressions, then moves on to look at more advanced tools such as string methods and formatting. Python comes with many string tools, and we won’t look at them all here; the complete story is chronicled in the Python library manual. Our goal here is to explore enough commonly used tools to give you a representative sample; methods we won’t see in action here, for example, are largely analogous to those we will.
* * *
Note
Content note: Technically speaking, this chapter tells only part of the string story in Python—the part most programmers need to know. It presents the basic str string type, which handles ASCII text and works the same regardless of which version of Python you use. That is, this chapter intentionally limits its scope to the string processing essentials that are used in most Python scripts.
From a more formal perspective, ASCII is a simple form of Unicode text. Python addresses the distinction between text and binary data by including distinct object types:
In Python 3.0 there are three string types: str is used for Unicode text (ASCII or otherwise), bytes is used for binary data (including encoded text), and bytearray is a mutable variant of bytes.
In Python 2.6, unicode strings represent wide Unicode text, and str strings handle both 8-bit text and binary data.
The bytearray type is also available as a back-port in 2.6, but not earlier, and it’s not as closely bound to binary data as it is in 3.0. Because most programmers don’t need to dig into the details of Unicode encodings or binary data formats, though, I’ve moved all such details to the Advanced Topics part of this book, in Chapter 36.
If you do need to deal with more advanced string concepts such as alternative character sets or packed binary data and files, see Chapter 36 after reading the material here. For now, we’ll focus on the basic string type and its operations. As you’ll find, the basics we’ll study here also apply directly to the more advanced string types in Python’s toolset.
* * *
String Literals
By and large, strings are fairly easy to use in Python. Perhaps the most complicated thing about them is that there are so many ways to write them in your code:
Single quotes: 'spa"m'
Double quotes: "spa'm"
Triple quotes: '''... spam ...''', """... spam ..."""
Escape sequences: "s\tp\na\0m"
Raw strings: r"C:\new\test.spm"
Byte strings in 3.0 (see Chapter 36): b'sp\x01am'
Unicode strings in 2.6 only (see Chapter 36): u'eggs\u0020spam'
The single- and double-quoted forms are by far the most common; the others serve specialized roles, and we’re postponing discussion of the last two advanced forms until Chapter 36. Let’s take a quick look at all the other options in turn.
Single- and Double-Quoted