Online Book Reader

Home Category

Learning Python - Mark Lutz [203]

By Root 1793 0
or other iteration tool, because all iteration tools normally work internally by calling __next__ on each iteration and catching the StopIteration exception to determine when to exit.

The net effect of this magic is that, as mentioned in Chapter 9, the best way to read a text file line by line today is to not read it at all—instead, allow the for loop to automatically call __next__ to advance to the next line on each iteration. The file object’s iterator will do the work of automatically loading lines as you go. The following, for example, reads a file line by line, printing the uppercase version of each line along the way, without ever explicitly reading from the file at all:

>>> for line in open('script1.py'): # Use file iterators to read by lines

... print(line.upper(), end='') # Calls __next__, catches StopIteration

...

IMPORT SYS

PRINT(SYS.PATH)

X = 2

PRINT(2 ** 33)

Notice that the print uses end='' here to suppress adding a \n, because line strings already have one (without this, our output would be double-spaced). This is considered the best way to read text files line by line today, for three reasons: it’s the simplest to code, might be the quickest to run, and is the best in terms of memory usage. The older, original way to achieve the same effect with a for loop is to call the file readlines method to load the file’s content into memory as a list of line strings:

>>> for line in open('script1.py').readlines():

... print(line.upper(), end='')

...

IMPORT SYS

PRINT(SYS.PATH)

X = 2

PRINT(2 ** 33)

This readlines technique still works, but it is not considered the best practice today and performs poorly in terms of memory usage. In fact, because this version really does load the entire file into memory all at once, it will not even work for files too big to fit into the memory space available on your computer. By contrast, because it reads one line at a time, the iterator-based version is immune to such memory-explosion issues. The iterator version might run quicker too, though this can vary per release (Python 3.0 made this advantage less clear-cut by rewriting I/O libraries to support Unicode text and be less system-dependent).

As mentioned in the prior chapter’s sidebar, Why You Will Care: File Scanners, it’s also possible to read a file line by line with a while loop:

>>> f = open('script1.py')

>>> while True:

... line = f.readline()

... if not line: break

... print(line.upper(), end='')

...

...same output...

However, this may run slower than the iterator-based for loop version, because iterators run at C language speed inside Python, whereas the while loop version runs Python byte code through the Python virtual machine. Any time we trade Python code for C code, speed tends to increase. This is not an absolute truth, though, especially in Python 3.0; we’ll see timing techniques later in this book for measuring the relative speed of alternatives like these.

Manual Iteration: iter and next

To support manual iteration code (with less typing), Python 3.0 also provides a built-in function, next, that automatically calls an object’s __next__ method. Given an iterable object X, the call next(X) is the same as X.__next__(), but noticeably simpler. With files, for instance, either form may be used:

>>> f = open('script1.py')

>>> f.__next__() # Call iteration method directly

'import sys\n'

>>> f.__next__()

'print(sys.path)\n'

>>> f = open('script1.py')

>>> next(f) # next built-in calls __next__

'import sys\n'

>>> next(f)

'print(sys.path)\n'

Technically, there is one more piece to the iteration protocol. When the for loop begins, it obtains an iterator from the iterable object by passing it to the iter built-in function; the object returned by iter has the required next method. This becomes obvious if we look at how for loops internally process built-in sequence types such as lists:

>>> L = [1, 2, 3]

>>> I = iter(L) # Obtain an iterator object

>>> I.next() # Call next to advance to next item

1

>>> I.next()

2

>>> I.next()

3

>>> I.next()

Traceback (most recent

Return Main Page Previous Page Next Page

®Online Book Reader