Selecting based on probabilities

The brother of a co-worker asked me for a routine to get incides randomly that matched a given distribution. His example was this: given an array of probabilities that sum to 1.0 ([0.1, 0.4, 0.3, 0.2]), produce a list where each of the indices (0, 1, 2, 3) is selected according to the array of possibilities. For instance: index 0 is selected 10% of the time, index 1 is selected 40% of the time, index 2 is selected 30% of the time, and index 3 is selected 20% of the time.

The list of probabilities must sum to 1.0, with no negative values.

I gave him an initial solution that wouldn't scale well, and then in the morning (I do my best thinking at home while sitting in bed) I gave him this one. This one should scale nicely. One modification to keep the size down would be to directly reference the initial array instead of creating the second array. That requires more computation.

Here's the python version of this algorithm:

	import random
	class RandomSelection( object ):
	    def __init__(self, a):
	        self.a = a
	        self.b = [0.0]
	    def loadB(self):
	        for i in range( len( self.a )):
	            self.b.append( self.a[i] +self.b[i])
	    def getSelection(self, rnd):
	        iMin = 0
	        iMax = len(a)
	            test = (iMax-iMin)//2 + iMin
	            if rnd < self.b[test]:
	                iMax = test
	                iMin = test
	            if iMin +1 == iMax or iMax == 1:
	                return (rnd, iMin, self.a[iMin])
	# test program:
	a = [0.1,0.4,0.3,0.2]
	r = RandomSelection( a )
	for i in range(20):
	    rnd = random.random()
	    print( r.getSelection( rnd ))

The python test code is here.

The selection of the random number could be moved to inside the getSelection method as well.