Learning Python - Mark Lutz [145]
Let’s look at an example. When you read a binary data file you get back a bytes object—a sequence of small integers that represent absolute byte values (which may or may not correspond to characters), which looks and feels almost exactly like a normal string:
>>> data = open('data.bin', 'rb').read() # Open binary file: rb=read binary
>>> data # bytes string holds binary data
b'\x00\x00\x00\x07spam\x00\x08'
>>> data[4:8] # Act like strings
b'spam'
>>> data[4:8][0] # But really are small 8-bit integers
115
>>> bin(data[4:8][0]) # Python 3.0 bin() function
'0b1110011'
In addition, binary files do not perform any end-of-line translation on data; text files by default map all forms to and from \n when written and read and implement Unicode encodings on transfers. Since Unicode and binary data is of marginal interest to many Python programmers, we’ll postpone the full story until Chapter 36. For now, let’s move on to some more substantial file examples.
Storing and parsing Python objects in files
Our next example writes a variety of Python objects into a text file on multiple lines. Notice that it must convert objects to strings using conversion tools. Again, file data is always strings in our scripts, and write methods do not do any automatic to-string formatting for us (for space, I’m omitting byte-count return values from write methods from here on):
>>> X, Y, Z = 43, 44, 45 # Native Python objects
>>> S = 'Spam' # Must be strings to store in file
>>> D = {'a': 1, 'b': 2}
>>> L = [1, 2, 3]
>>>
>>> F = open('datafile.txt', 'w') # Create output file
>>> F.write(S + '\n') # Terminate lines with \n
>>> F.write('%s,%s,%s\n' % (X, Y, Z)) # Convert numbers to strings
>>> F.write(str(L) + '$' + str(D) + '\n') # Convert and separate with $
>>> F.close()
Once we have created our file, we can inspect its contents by opening it and reading it into a string (a single operation). Notice that the interactive echo gives the exact byte contents, while the print operation interprets embedded end-of-line characters to render a more user-friendly display:
>>> chars = open('datafile.txt').read() # Raw string display
>>> chars
"Spam\n43,44,45\n[1, 2, 3]${'a': 1, 'b': 2}\n"
>>> print(chars) # User-friendly display
Spam
43,44,45
[1, 2, 3]${'a': 1, 'b': 2}
We now have to use other conversion tools to translate from the strings in the text file to real Python objects. As Python never converts strings to numbers (or other types of objects) automatically, this is required if we need to gain access to normal object tools like indexing, addition, and so on:
>>> F = open('datafile.txt') # Open again
>>> line = F.readline() # Read one line
>>> line
'Spam\n'
>>> line.rstrip() # Remove end-of-line
'Spam'
For this first line, we used the string rstrip method to get rid of the trailing end-of-line character; a line[:−1] slice would work, too, but only if we can be sure all lines end in the \n character (the last line in a file sometimes does not).
So far, we’ve read the line containing the string. Now let’s grab the next line, which contains numbers, and parse out (that is, extract) the objects on that line:
>>> line = F.readline() # Next line from file
>>> line # It's a string here
'43,44,45\n'
>>> parts = line.split(',') # Split (parse) on commas
>>> parts
['43', '44', '45\n']
We used the string split method here to chop up the line on its comma delimiters; the result is a list of substrings containing the individual numbers. We still must convert from strings to integers, though, if we wish to perform math on these:
>>> int(parts[1]) # Convert from string to int
44
>>> numbers = [int(P) for P in parts] # Convert all in list at once
>>> numbers
[43, 44, 45]
As we have learned, int translates a string of digits into an integer object, and the list comprehension expression introduced in Chapter 4 can apply the call to each item in our list all at once