Online Book Reader

Home Category

Beautiful Code [98]

By Root 5191 0
M. D. McIlroy ("Engineering a sort function," Software–Practice and Experience, Vol. 23, No. 11) describes a serious performance bug in the venerable Unix qsort function. We set out to build a new C library sort function, and considered many different algorithms for the task, including Merge Sort and Heap Sort. After comparing several possible implementations, we settled on a version of the Quicksort algorithm. That paper describes how we engineered a new function that was clearer, faster, and more robust than its competitors—partly because it was smaller. Gordon Bell's sage advice proved true: "The cheapest, fastest, and most reliable components of a computer system are those that aren't there." That function has now been widely used for over a decade with no reports of failure.

Considering the gains that could be achieved by reducing code size, I finally asked myself a third variant of the question that began this chapter. "What is the most beautiful code that you never wrote?" How was I able to accomplish a great deal with very little? The answer was once again related to Quicksort, specifically, the analysis of its performance. The next section tells that tale.

The Most Beautiful Code I Never Wrote > More and More with Less and Less

3.2. More and More with Less and Less

Quicksort is an elegant algorithm that lends itself to subtle analysis. Around 1980, I had a wonderful discussion with Tony Hoare about the history of his algorithm. He told me that when he first developed Quicksort, he thought it was too simple to publish, and only wrote his classic "Quicksort" paper after he was able to analyze its expected runtime.

It is easy to see that in the worst case, Quicksort might take about n2 time to sort an array of n elements. In the best case, it chooses the median value as a partitioning element, and therefore sorts an array in about n lg n comparisons. So, how many comparisons does it use on the average for a random array of n distinct values?

Hoare's analysis of this question is beautiful, but unfortunately over the mathematical heads of many programmers. When I taught Quicksort to undergraduates, I was frustrated that many just didn't "get" the proof, even after sincere effort. We'll now attack that problem experimentally. We'll start with Hoare's program, and eventually end up with an analysis close to his.

Our task is to modify Example 3-1 of the randomizing Quicksort code to analyze the average number of comparisons used to sort an array of distinct inputs. We will also attempt to gain maximum insight with minimal code, runtime, and space.

To determine the average number of comparisons, we first augment the program to count them. To do this, we increment the variable comps before the comparison in the inner loop (Example 3-2).

Example 3-2. Quicksort inner loop instrumented to count comparisons

for (i = l+1; i <= u; i++) {

comps++;

if (x[i] < x[l])

swap(++m, i);

}

If we run the program for one value of n, we'll see how many comparisons that particular run takes. If we repeat that for many runs over many values of n, and analyze the results statistically, we'll observe that, on average, Quicksort takes about 1.4 n lg n comparisons to sort n elements.

That isn't a bad way to gain insight into the behavior of a program. Thirteen lines of code and a few experiments can reveal a lot. A famous quote attributed to writers such as Blaise Pascal and T. S. Eliot states that, "If I had more time, I would have written you a shorter letter." We have the time, so let's experiment with the code to attempt to create a shorter (and better) program.

We'll play the game of speeding up that experiment, trying to increase statistical accuracy and programming insight. Because the inner loop always makes precisely u-l comparisons, we can make the program a tiny bit faster by counting those comparisons in a single operation outside the loop. This change yields the Quicksort shown in Example 3-3.

Example 3-3. Quicksort inner loop with increment moved out of loop

comps += u-l;

for (i = l+1; i <= u; i++)

Return Main Page Previous Page Next Page

®Online Book Reader