Sorting (Selecting) Items in an Array

Sometimes you don't actually need to sort a large list of items, but just need the n best matches to some supplied criteria. This is often the case in numerical modelling.

This routine will scan through your data and select the number of items you specify. It's a modification of the quicksort algorithm, but is not recursive. It selects one item as the "pivot", and walks it through the data until all the data on one side is "less than" this item, and everything on the other side is "greater than" this item. Then it checks to see if the pivot is now at the requested number, and if not, selects that side for the next scan.

Here's the python version of this algorithm:

def selectTop(grid, selectionNumber, criteria):
	sectionFirst = 0
	sectionLast = len(grid) -1
	while True:
		direction = -1
		walkerFirst = sectionFirst
		walkerLast = sectionLast
		while walkerFirst != walkerLast:
			if criteria(grid[walkerLast], grid[walkerFirst]):
				swapItems(grid, walkerFirst, walkerLast)
				direction *= -1
			if direction < 0: walkerLast -= 1
			else: walkerFirst += 1
		if walkerFirst < selectionNumber: sectionFirst = walkerFirst + 1
		elif selectionNumber < walkerFirst: sectionLast = walkerFirst - 1
		else: break

The python test code is here for testing, if needed. The fortran code from 1985 is long gone.

I was a junior officer at AFGWC on Offutt AFB in Omaha, NE, USA. My boss at the time was Major Jim Stobie, head of the Numerical Models section. We were developing the HIRAS weather analysis model, and at one point late in the development, the selecting of the best pieces of data in the Optimal Interpolation routine was the slowest item in the execution, taking about 250 seconds per run on the Cray X-MP/24.

Major Stobie asked me if there was anything I could do to improve the performance, and I took this problem home with me that evening and brought in the algorithm the next day. The first version cut the sorting time down to about 75 seconds, and the next day the version here brought it down to 64 seconds. The sorting was no longer the bottleneck, so we moved on to other optimizations.

I'd like to say the challenging nature of the work helped with our productivity, but that could also have been related to the times we'd drop everything and the office would head for the golf course as a team.