Learning Python - Mark Lutz [66]
>>> msg = """ aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc"""
>>> msg
'\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc'
Python also supports a raw string literal that turns off the backslash escape mechanism (such string literals start with the letter r), as well as Unicode string support that supports internationalization. In 3.0, the basic str string type handles Unicode too (which makes sense, given that ASCII text is a simple kind of Unicode), and a bytes type represents raw byte strings; in 2.6, Unicode is a separate type, and str handles both 8-bit strings and binary data. Files are also changed in 3.0 to return and accept str for text and bytes for binary data. We’ll meet all these special string forms in later chapters.
Pattern Matching
One point worth noting before we move on is that none of the string object’s methods support pattern-based text processing. Text pattern matching is an advanced tool outside this book’s scope, but readers with backgrounds in other scripting languages may be interested to know that to do pattern matching in Python, we import a module called re. This module has analogous calls for searching, splitting, and replacement, but because we can use patterns to specify substrings, we can be much more general:
>>> import re
>>> match = re.match('Hello[ \t]*(.*)world', 'Hello Python world')
>>> match.group(1)
'Python '
This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.” If such a substring is found, portions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes:
>>> match = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumberjack')
>>> match.groups()
('usr', 'home', 'lumberjack')
Pattern matching is a fairly advanced text-processing tool by itself, but there is also support in Python for even more advanced text and language processing, including XML parsing and natural language analysis. I’ve already said enough about strings for this tutorial, though, so let’s move on to the next type.
Lists
The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutable—unlike strings, lists can be modified in-place by assignment to offsets as well as a variety of list method calls.
Sequence Operations
Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that the results are usually lists instead of strings. For instance, given a three-item list:
>>> L = [123, 'spam', 1.23] # A list of three different-type objects
>>> len(L) # Number of items in the list
3
we can index, slice, and so on, just as for strings:
>>> L[0] # Indexing by position
123
>>> L[:-1] # Slicing a list returns a new list
[123, 'spam']
>>> L + [4, 5, 6] # Concatenation makes a new list too
[123, 'spam', 1.23, 4, 5, 6]
>>> L # We're not changing the original list
[123, 'spam', 1.23]
Type-Specific Operations
Python’s lists are related to arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraint—the list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:
>>> L.append('NI') # Growing: add object at end of list
>>>