Selecting based on probabilities
The brother of a co-worker asked me for a routine to get incides randomly that matched a given distribution. His example was this: given an array of probabilities that sum to 1.0 ([0.1, 0.4, 0.3, 0.2]), produce a list where each of the indices (0, 1, 2, 3) is selected according to the array of possibilities. For instance: index 0 is selected 10% of the time, index 1 is selected 40% of the time, index 2 is selected 30% of the time, and index 3 is selected 20% of the time.
The list of probabilities must sum to 1.0, with no negative values.
I gave him an initial solution that wouldn't scale well, and then in the morning (I do my best thinking at home while sitting in bed) I gave him this one. This one should scale nicely. One modification to keep the size down would be to directly reference the initial array instead of creating the second array. That requires more computation.
Here's the python version of this algorithm:
import random class RandomSelection( object ): def __init__(self, a): self.a = a self.b = [0.0] self.loadB() def loadB(self): for i in range( len( self.a )): self.b.append( self.a[i] +self.b[i]) def getSelection(self, rnd): iMin = 0 iMax = len(a) while(True): test = (iMax-iMin)//2 + iMin if rnd < self.b[test]: iMax = test else: iMin = test if iMin +1 == iMax or iMax == 1: return (rnd, iMin, self.a[iMin]) # test program: a = [0.1,0.4,0.3,0.2] print('a:',a) r = RandomSelection( a ) for i in range(20): rnd = random.random() print( r.getSelection( rnd )) print('done')
The python test code is here.
The selection of the random number could be moved to inside the getSelection method as well.