Beautiful Code [132]
The way the midpoint is calculated is correct
then I can have confidence that binarySearch will do the right thing on very large arrays.
So the not-so-obvious, but beautiful, testing strategy is to isolate and test the pesky, overflow-prone calculation independently.
One possibility is to create a new method:
static int calculateMidpoint(int low, int high) {
return (low + high) >>> 1;
}
then change the following line in the code from:
int mid = (low + high) >>> 1;
to:
int mid = calculateMidpoint(low, high);
and then test the heck out of the calculateMidpoint method to make sure it always does the right thing.
I can already hear a few of you screaming about adding the overhead of a method call in an algorithm designed for maximum speed. But there's no need to cry foul. Here's why I believe this change to the code is not only acceptable, but the right thing to do:
These days, I can trust compiler optimization to do the right thing and inline the method for me, so there is no performance penalty.
The change makes the code more readable. I checked with several other Java programmers, and most of them were not familiar with the unsigned bit shift operator, or were not 100 percent sure how it worked. For them, seeing calculateMidpoint(low, high) is more obvious than seeing (low + high) >>> 1.
The change makes the code more testable.
This is actually a good example of how the very act of creating a test for your code will improve its design or legibility. In other words, testing can help you make your code more beautiful.
Here is a sample boundary test for the new calculateMidpoint method:
@Test
public void calculateMidpointWithBoundaryValues() {
assertEquals(0, calculateMidpoint (0, 1));
assertEquals(1, calculateMidpoint (0, 2));
assertEquals(1200000000, calculateMidpoint (1100000000, 1300000000));
assertEquals(Integer.MAX_VALUE - 2,
calculateMidpoint (Integer.MAX_VALUE-2, Integer.MAX_VALUE-1));
assertEquals(Integer.MAX_VALUE - 1,
calculateMidpoint (Integer.MAX_VALUE-1, Integer.MAX_VALUE));
}
I run the tests, and they pass. Good. I am now confident that calculating mid using the unfamiliar operator does what it's supposed to do within the range of array sizes I want to handle with this implementation of binary search.
The other set of boundary cases has to do with the position of the target number. I can think of three obvious boundary cases for the target item location: first item in the list, last item in the list, and right smack in the middle of the list. So, I write a simple test to check these cases:
Code View: Scroll / Show All
@Test
public void testBoundaryCasesForItemLocation() {
testArray = new int[] { -324, -3, -1, 0, 42, 99, 101 };
assertEquals(0, Util.binarySearch(testArray, -324)); // first position
assertEquals(3, Util.binarySearch(testArray, 0)); // middle position
assertEquals(6, Util.binarySearch(testArray, 101)); // last position
}
Note that in this test I used some negative numbers and 0, both in the array and for the target number. It had occurred to me, while reading the tests I had already written, that I had used only positive numbers. Since that's not part of the specification, I should introduce negative numbers and 0 in my tests. Which leads me to the following piece of testing wisdom:
The best way to think of more test cases is to start writing some test cases.
Now that I started to think about positive/negative numbers and 0, I realize that it would be good to have a couple of tests that use the minimum and maximum integer values.
public void testForMinAndMaxInteger() {
testArray = new int[] {
Integer.MIN_VALUE, -324, -3, -1, 0, 42, 99, 101, Integer.MAX_VALUE
};
assertEquals(0, Util.binarySearch(testArray, Integer.MIN_VALUE));
assertEquals(8, Util.binarySearch(testArray, Integer.MAX_VALUE));
}
So far, all the boundary cases I thought of passed, and I am starting to feel pretty confident. But then I think