Beautiful Code [235]
Multidimensional Iterators in NumPy > Iterator Design
19.4. Iterator Design
As described previously, an iterator is an abstract concept that encapsulates the idea of walking through each element of an array. The basic pseudocode for an iterator-based loop used in NumPy is:
set up iterator
(including pointing the current value to the first value in the array)
while iterator not done:
process the current value
point the current value to the next value
Everything but process the current value must be handled by the iterator and deserves discussion. As a result, there are basically three parts to the iterator design:
Moving to the next value
Termination
Setup
These will each be discussed separately. The design considerations that went into NumPy's iterators included making the overhead for using them inside of a loop as small as possible and making them as fast as possible.
19.4.1. Iterator Progression
The first decision is the order in which the elements will be taken. Although one could conceive of an iterator with no guarantee of the order in which the elements are taken, it is useful most of the time for the programmer to know the order. As a result, iterators in NumPy follow a specific order. The order is obtained using a relatively simple approach patterned after simple counting (with wrap-around) using a tuple of digits. Let a tuple of N integers represent the current position in the array, with (0,…,0) representing the first element of the n1 x n2 x … x nN array, and (n1-1, n2-1,…, nN-1) representing the last element.
This tuple of integers represents an N-digit counter. The next position is found by incrementing the last digit by one. If, during this process, the ith digit reaches ni, it is set to 0, and the (i-1)th digit is incremented by 1.
For example, the counting for a 3 x 2 x 4 array would proceed as follows:
(0,0,0) (0,0,1) (0,0,2) (0,0,3) (0,1,0) (0,1,1) (0,1,2) (0,1,3) (1,0,0) … (2,1,2) (2,1,3)
The next increment would produce (0,0,0), and the iterator would be set up to start all over again.
This counter is the essence of the NumPy iterator. The way it is incremented plays an important part in the iterator implementation. As a result, the implementation of the counter will be discussed in a subsequent section. Assuming that this counter that specifies the current position in the array is available, a pointer to the current value in the array can always be obtained by multiplying the integers of the counter by the stride values defined with the array, yielding the number of bytes to add to the memory address of the first element of the array.
For example, if data is a pointer to the start of the array, counter is the counter (or coordinate) array, and strides is an array of stride values, the following operations:
currptr = (char *)data;
for (i=0; i set currptr to the (first byte of the) current value of the array. In fact, rather than compute this multiplication every time a pointer to the current value is needed, the implementation can keep track of the pointer at the same time that it keeps track of the counter, making adjustments every time the counter is altered. For example, when the ith index of the counter is incremented by 1, currptr is incremented by strides[i]. When the ith index is reset to 0, this is the same as subtracting ni-1 from the current index, and therefore the memory address of the current value of the array should be decremented by (ni-1) -strides[i]. For the general case, the iterator maintains the counter specifying the position in the array along with a pointer to the current value. In the case of