Binary Search Overflow Bug

Binary search is an elementary algorithm for finding an element inside a sorted array. The idea is simple: the array is split in halves multiple times until the element is found. Although the idea is simple, the implementation is tricky. It is easy to make mistakes such as off-by-one due to division and boundary checks, and maybe dealing with duplicate values in the array.

But a typical implementation would look something like this:

public int search(int[] arr, int v) {
    int left = 0;
    int right = arr.length - 1;
    while (left <= right) {
        int mid = (left + right) / 2;
        int midv = arr[mid];
        if (midv < v)
            left = mid + 1
        else if (midv > v)
            right = mid - 1;
        else
            return mid;
    }
    return -(left + 1);
}

This implementation exists everywhere, and it is correct in most cases. However, the only flaw it contains is that

int mid = (left + right) / 2;

could cause an overflow error if left + right is more than what int can store. The user of search would not expect such flow when the array is large enough. This bug was even in the java library for few years before it was discovered and fixed, which tells you how easy it is to implement this search incorrectly. The fix is of course simple:

int mid = left + (right - left) / 2;

or you can use the shift operator to resolve this as in the implementation of binary search in golang (which returns the index at which the element could be in).

func Search(n int, f func(int) bool) int {
	// Define f(-1) == false and f(n) == true.
	// Invariant: f(i-1) == false, f(j) == true.
	i, j := 0, n
	for i < j {
		h := int(uint(i+j) >> 1) // avoid overflow when computing h
		// i ≤ h < j
		if !f(h) {
			i = h + 1 // preserves f(i-1) == false
		} else {
			j = h // preserves f(j) == true
		}
	}
	// i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
	return i
}

This bug is discussed in multiple places such as the wiki page or Google AI Blog. There is no clear lesson here, but the best advice is to not implement an algorithm if it is provided by the standard library of the language you use.