Learning Python - Mark Lutz [147]
* * *
Storing and parsing packed binary data in files
One other file-related note before we move on: some advanced applications also need to deal with packed binary data, created perhaps by a C language program. Python’s standard library includes a tool to help in this domain—the struct module knows how to both compose and parse packed binary data. In a sense, this is another data-conversion tool that interprets strings in files as binary data.
To create a packed binary data file, for example, open it in 'wb' (write binary) mode, and pass struct a format string and some Python objects. The format string used here means pack as a 4-byte integer, a 4-character string, and a 2-byte integer, all in big-endian form (other format codes handle padding bytes, floating-point numbers, and more):
>>> F = open('data.bin', 'wb') # Open binary output file
>>> import struct
>>> data = struct.pack('>i4sh', 7, 'spam', 8) # Make packed binary data
>>> data
b'\x00\x00\x00\x07spam\x00\x08'
>>> F.write(data) # Write byte string
>>> F.close()
Python creates a binary bytes data string, which we write out to the file normally—this one consists mostly of nonprintable characters printed in hexadecimal escapes, and is the same binary file we met earlier. To parse the values out to normal Python objects, we simply read the string back and unpack it using the same format string. Python extracts the values into normal Python objects—integers and a string:
>>> F = open('data.bin', 'rb')
>>> data = F.read() # Get packed binary data
>>> data
b'\x00\x00\x00\x07spam\x00\x08'
>>> values = struct.unpack('>i4sh', data) # Convert to Python objects
>>> values
(7, 'spam', 8)
Binary data files are advanced and somewhat low-level tools that we won’t cover in more detail here; for more help, see Chapter 36, consult the Python library manual, or import struct and pass it to the help function interactively. Also note that the binary file-processing modes 'wb' and 'rb' can be used to process a simpler binary file such as an image or audio file as a whole without having to unpack its contents.
File context managers
You’ll also want to watch for Chapter 33’s discussion of the file’s context manager support, new in Python 3.0 and 2.6. Though more a feature of exception processing than files themselves, it allows us to wrap file-processing code in a logic layer that ensures that the file will be closed automatically on exit, instead of relying on the auto-close on garbage collection:
with open(r'C:\misc\data.txt') as myfile: # See Chapter 33 for details
for line in myfile:
...use line here...
The try/finally statement we’ll look at in Chapter 33 can provide similar functionality, but at some cost in extra code—three extra lines, to be precise (though we can often avoid both options and let Python close files for us automatically):
myfile = open(r'C:\misc\data.txt')
try:
for line in myfile:
...use line here...
finally:
myfile.close()
Since both these options require more information than we have yet obtained, we’ll postpone details until later in this book.
Other File Tools
There are additional, more advanced file methods shown in Table 9-2, and even more that are not in the table. For instance, as mentioned earlier, seek resets your current position in a file (the next read or write happens at that position), flush forces buffered output to be written out to disk (by default, files are always buffered), and so on.
The Python standard library manual and the reference books described in the Preface provide complete lists of file methods; for a quick look, run a dir or help call interactively, passing in an open file object (in Python 2.6 but not 3.0, you can pass in the name file instead). For more file-processing examples, watch for the sidebar Why You Will Care: File Scanners. It sketches common file-scanning loop code patterns with statements we have not covered enough yet to use here.
Also, note that although the open function and the file objects it returns