Learning Python - Mark Lutz [144]
>>> myfile = open('myfile.txt', 'w') # Open for text output: create/empty
>>> myfile.write('hello text file\n') # Write a line of text: string
16
>>> myfile.write('goodbye text file\n')
18
>>> myfile.close() # Flush output buffers to disk
>>> myfile = open('myfile.txt') # Open for text input: 'r' is default
>>> myfile.readline() # Read the lines back
'hello text file\n'
>>> myfile.readline()
'goodbye text file\n'
>>> myfile.readline() # Empty string: end of file
''
Notice that file write calls return the number of characters written in Python 3.0; in 2.6 they don’t, so you won’t see these numbers echoed interactively. This example writes each line of text, including its end-of-line terminator, \n, as a string; write methods don’t add the end-of-line character for us, so we must include it to properly terminate our lines (otherwise the next write will simply extend the current line in the file).
If you want to display the file’s content with end-of-line characters interpreted, read the entire file into a string all at once with the file object’s read method and print it:
>>> open('myfile.txt').read() # Read all at once into string
'hello text file\ngoodbye text file\n'
>>> print(open('myfile.txt').read()) # User-friendly display
hello text file
goodbye text file
And if you want to scan a text file line by line, file iterators are often your best option:
>>> for line in open('myfile'): # Use file iterators, not reads
... print(line, end='')
...
hello text file
goodbye text file
When coded this way, the temporary file object created by open will automatically read and return one line on each loop iteration. This form is usually easiest to code, good on memory use, and may be faster than some other options (depending on many variables, of course). Since we haven’t reached statements or iterators yet, though, you’ll have to wait until Chapter 14 for a more complete explanation of this code.
Text and binary files in Python 3.0
Strictly speaking, the example in the prior section uses text files. In both Python 3.0 and 2.6, file type is determined by the second argument to open, the mode string—an included “b” means binary. Python has always supported both text and binary files, but in Python 3.0 there is a sharper distinction between the two:
Text files represent content as normal str strings, perform Unicode encoding and decoding automatically, and perform end-of-line translation by default.
Binary files represent content as a special bytes string type and allow programs to access file content unaltered.
In contrast, Python 2.6 text files handle both 8-bit text and binary data, and a special string type and file interface (unicode strings and codecs.open) handles Unicode text. The differences in Python 3.0 stem from the fact that simple and Unicode text have been merged in the normal string type—which makes sense, given that all text is Unicode, including ASCII and other 8-bit encodings.
Because most programmers deal only with ASCII text, they can get by with the basic text file interface used in the prior example, and normal strings. All strings are technically Unicode in 3.0, but ASCII users will not generally notice. In fact, files and strings work the same in 3.0 and 2.6 if your script’s scope is limited to such simple forms of text.
If you need to handle internationalized applications or byte-oriented data, though, the distinction in 3.0 impacts your code (usually for the better). In general, you must use bytes strings for binary files, and normal str strings for text files. Moreover, because text