Learning Python - Mark Lutz [146]
Finally, to convert the stored list and dictionary in the third line of the file, we can run them through eval, a built-in function that treats a string as a piece of executable program code (technically, a string containing a Python expression):
>>> line = F.readline()
>>> line
"[1, 2, 3]${'a': 1, 'b': 2}\n"
>>> parts = line.split('$') # Split (parse) on $
>>> parts
['[1, 2, 3]', "{'a': 1, 'b': 2}\n"]
>>> eval(parts[0]) # Convert to any object type
[1, 2, 3]
>>> objects = [eval(P) for P in parts] # Do same for all in list
>>> objects
[[1, 2, 3], {'a': 1, 'b': 2}]
Because the end result of all this parsing and converting is a list of normal Python objects instead of strings, we can now apply list and dictionary operations to them in our script.
Storing native Python objects with pickle
Using eval to convert from strings to objects, as demonstrated in the preceding code, is a powerful tool. In fact, sometimes it’s too powerful. eval will happily run any Python expression—even one that might delete all the files on your computer, given the necessary permissions! If you really want to store native Python objects, but you can’t trust the source of the data in the file, Python’s standard library pickle module is ideal.
The pickle module is an advanced tool that allows us to store almost any Python object in a file directly, with no to- or from-string conversion requirement on our part. It’s like a super-general data formatting and parsing utility. To store a dictionary in a file, for instance, we pickle it directly:
>>> D = {'a': 1, 'b': 2}
>>> F = open('datafile.pkl', 'wb')
>>> import pickle
>>> pickle.dump(D, F) # Pickle any object to file
>>> F.close()
Then, to get the dictionary back later, we simply use pickle again to re-create it:
>>> F = open('datafile.pkl', 'rb')
>>> E = pickle.load(F) # Load any object from file
>>> E
{'a': 1, 'b': 2}
We get back an equivalent dictionary object, with no manual splitting or converting required. The pickle module performs what is known as object serialization—converting objects to and from strings of bytes—but requires very little work on our part. In fact, pickle internally translates our dictionary to a string form, though it’s not much to look at (and may vary if we pickle in other data protocol modes):
>>> open('datafile.pkl', 'rb').read() # Format is prone to change!
b'\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01K\x01X\x01\x00\x00\x00bq\x02K\x02u.'
Because pickle can reconstruct the object from this format, we don’t have to deal with that ourselves. For more on the pickle module, see the Python standard library manual, or import pickle and pass it to help interactively. While you’re exploring, also take a look at the shelve module. shelve is a tool that uses pickle to store Python objects in an access-by-key filesystem, which is beyond our scope here (though you will get to see an example of shelve in action in Chapter 27, and other pickle examples in Chapters 30 and 36).
* * *
Note
Note that I opened the file used to store the pickled object in binary mode; binary mode is always required in Python 3.0, because the pickler creates and uses a bytes string object, and these objects imply binary-mode files (text-mode files imply str strings in 3.0). In earlier Pythons it’s OK to use text-mode files for protocol 0 (the default, which creates ASCII text), as long as text mode is used consistently; higher protocols require binary-mode files. Python 3.0’s default protocol is 3 (binary), but it creates bytes even for protocol 0. See Chapter 36, Python’s library manual, or reference books for more details on this.
Python 2.6 also has a cPickle module, which is an optimized version of pickle that can be imported directly for speed. Python 3.0 renames this module _pickle and uses it automatically in pickle—scripts simply import