Online Book Reader

Home Category

Learning Python - Mark Lutz [502]

By Root 1764 0
30. In Chapter 27, we also used the shelve module, which uses pickle internally. For completeness here, keep in mind that the Python 3.0 version of the pickle module always creates a bytes object, regardless of the default or passed-in “protocol” (data format level). You can see this by using the module’s dumps call to return an object’s pickle string:

C:\misc> C:\Python30\python

>>> import pickle # dumps() returns pickle string

>>> pickle.dumps([1, 2, 3]) # Python 3.0 default protocol=3=binary

b'\x80\x03]q\x00(K\x01K\x02K\x03e.'

>>> pickle.dumps([1, 2, 3], protocol=0) # ASCII protocol 0, but still bytes!

b'(lp0\nL1L\naL2L\naL3L\na.'

This implies that files used to store pickled objects must always be opened in binary mode in Python 3.0, since text files use str strings to represent data, not bytes—the dump call simply attempts to write the pickle string to an open output file:

>>> pickle.dump([1, 2, 3], open('temp', 'w')) # Text files fail on bytes!

TypeError: can't write bytes to text stream # Despite protocol value

>>> pickle.dump([1, 2, 3], open('temp', 'w'), protocol=0)

TypeError: can't write bytes to text stream

>>> pickle.dump([1, 2, 3], open('temp', 'wb')) # Always use binary in 3.0

>>> open('temp', 'r').read()

UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in ...

Because pickle data is not decodable Unicode text, the same is true on input—correct usage in 3.0 requires always writing and reading pickle data in binary modes:

>>> pickle.dump([1, 2, 3], open('temp', 'wb'))

>>> pickle.load(open('temp', 'rb'))

[1, 2, 3]

>>> open('temp', 'rb').read()

b'\x80\x03]q\x00(K\x01K\x02K\x03e.'

In Python 2.6 (and earlier), we can get by with text-mode files for pickled data, as long as the protocol is level 0 (the default in 2.6) and we use text mode consistently to convert line-ends:

C:\misc> c:\python26\python

>>> import pickle

>>> pickle.dumps([1, 2, 3]) # Python 2.6 default=0=ASCII

'(lp0\nI1\naI2\naI3\na.'

>>> pickle.dumps([1, 2, 3], protocol=1)

']q\x00(K\x01K\x02K\x03e.'

>>> pickle.dump([1, 2, 3], open('temp', 'w')) # Text mode works in 2.6

>>> pickle.load(open('temp'))

[1, 2, 3]

>>> open('temp').read()

'(lp0\nI1\naI2\naI3\na.'

If you care about version neutrality, though, or don’t want to care about protocols or their version-specific defaults, always use binary-mode files for pickled data—the following works the same in Python 3.0 and 2.6:

>>> import pickle

>>> pickle.dump([1, 2, 3], open('temp', 'wb')) # Version neutral

>>> pickle.load(open('temp', 'rb')) # And required in 3.0

[1, 2, 3]

Because almost all programs let Python pickle and unpickle objects automatically and do not deal with the content of pickled data itself, the requirement to always use binary file modes is the only significant incompatibility in Python 3’s new pickling model. See reference books or Python’s manuals for more details on object pickling.

XML Parsing Tools

XML is a tag-based language for defining structured information, commonly used to define documents and data shipped over the Web. Although some information can be extracted from XML text with basic string methods or the re pattern module, XML’s nesting of constructs and arbitrary attribute text tend to make full parsing more accurate.

Because XML is such a pervasive format, Python itself comes with an entire package of XML parsing tools that support the SAX and DOM parsing models, as well as a package known as ElementTree—a Python-specific API for parsing and constructing XML. Beyond basic parsing, the open source domain provides support for additional XML tools, such as XPath, Xquery, XSLT, and more.

XML by definition represents text in Unicode form, to support internationalization. Although most of Python’s XML parsing tools have always returned Unicode strings, in Python 3.0 their results have mutated from the 2.X unicode type to the 3.0 general str string type—which makes sense, given that 3.0’s str string is Unicode, whether the encoding is ASCII or other.

We can’t go into many details here,

Return Main Page Previous Page Next Page

®Online Book Reader