Online Book Reader

Home Category

Beautiful Code [110]

By Root 5075 0
while the other half were those same hostnames reversed, none of which succeeded. The 2,000 queries completed in an average of about .02 seconds, so Ruby's hash implementation can look up records in a hash containing 12 million or so records thousands of times per second.

Those 55 minutes to load up the data are troubling, but there are some tricks to address that. You could, for example, load it up once, then serialize the hash out and read it back in. And I didn't try particularly hard to optimize the program.

The program was easy and quick to write, and it runs fast once it's initialized, so its performance is good both in terms of waiting-for-the-program time and waiting-for-the-programmer time. Still, I'm unsatisfied. I have the feeling that there ought to be a way to get this kind of performance while burning less memory, less startup time, or both. It involves writing our own search code, though.

4.3.1. Binary Search

Nobody gets a Computer Science degree without studying a wide variety of search algorithms: trees, heaps, hashes, lists, and more. My favorite among all these is binary search. Let's try it on the who-fetched-what-when problem and then look at what makes it beautiful.

My first attempt at putting binary search to use was quite disappointing; while the data took 10 minutes less to load, it required almost 100 MB more memory than with the hash. Clearly, there are some surprising things about the Ruby array implementation. The search also ran several times slower (but still in the range of thousands per second), but this is not surprising at all because the algorithm is running in Ruby code rather than with the underlying hardcoded hash implementation.

The problem is that in Ruby everything is an object, and arrays are fairly abstracted things with lots of built-in magic. So, let's reimplement the program in Java, in which integers are just integers, and arrays come with very few extras.[]

[] This discussion of binary search borrows heavily from my 2003 piece, "On the Goodness of Binary Search," available online at http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary.

Nothing could be simpler, conceptually, than binary search. You divide your search space in two and see whether you should be looking in the top or bottom half; then you repeat the exercise until done. Instructively, there are a great many ways to code this algorithm incorrectly, and several widely published versions contain bugs. The implementation mentioned in "On the Goodness of Binary Search," and shown in Java in Section 4.3.1, is based on one I learned from Gaston Gonnet, the lead developer of the Maple language for symbolic mathematics and currently Professor of Computer Science at ETH in Zürich.

Example 4-6. Binary search

1 package binary;

2

3 public class Finder {

4 public static int find(String[] keys, String target) {

5 int high = keys.length;

6 int low = -1;

7 while (high - low > 1) {

8 int probe = (low + high) >>> 1;

9 if (keys[probe].compareTo(target) > 0)

10 high = probe;

11 else

12 low = probe;

13 }

14 if (low == -1 || keys[low].compareTo(target) != 0)

15 return -1;

16 else

17 return low;

18 }

19 }

Key aspects of this program are as follows:

In lines 5–6, note that the high and low bounds are set one off the ends of the array, so neither are initially valid indices. This eliminates all sorts of corner cases.

The loop that starts in line 7 runs until the high and low bounds are adjacent; there is no testing to see whether the target has been found. Think for a minute whether you agree with this choice; we'll return to the question later.

The loop has two invariants. low is either –1 or points to something less than or equal to the target value. high is either one off the top of the array or points to something strictly greater than the target value.

Line 8 is particularly interesting. In an earlier version it read:

probe = (high + low) / 2;

but in June 2006, Java guru Josh Bloch showed how, in certain obscure circumstances, that code could lead to integer overflow (see http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html).

Return Main Page Previous Page Next Page

®Online Book Reader