On 01/-10/-28163 02:59 PM, Modulok wrote:
Does anyone know of an efficient way of doing a weighted random
choice? (I don't even know what algorithms like this would be called.)
Preferably, something that doesn't grow exponentially with the number
of elements in the list, or the size of their respective values.

For example: Assume I have a list of integer elements. I want to pick
an index from that list, using the value of said elements to control
the probability of selecting a given index:

w = [5, 20, 75]
wrand(w)

Where wrand() will return '2', (the index of 75),  about 75% of the time.
It would return '1', (the index of 20) about 20% of the time, and 0,
about 5% of the time. It could return the actual value, but returning
the index seems more flexible.

The implementation doesn't have to work exactly like this, even if the
weight values don't add up to 100, or are arbitrary, that's fine.
Hopefully you get the idea. Here's what I tried (it works, but is
slow):

### Begin Code Example ###
import random
random.seed(2) #<-- So we can reproduce the sequence for testing.

def wrandom(weights):
     '''
     Return a randomly selected index of the 'weights' list, using the
     values of that list to affect probability of said index being returned.
     '''
     debug = False
     flattened = []
     for i, w in enumerate(weights):
         if debug: print "i, w: ", i, w
         for k in range(w):
             flattened.append(i)
     if debug: print "flattened: ", flattened
     rnd = random.randint(0, (len(flattened) - 1))
     return flattened[rnd]

# Not test it:
print wrandom([5, 20, 75])
print wrandom([5, 20, 75])
print wrandom([5, 20, 75])

### End Code Example ###

It works and is easy enough to understand, but when the number of list
items gets large, or the weights get heavy, things get ugly.
-Modulok-

You're building a temporary list, then discarding it when the function returns. Your speed efficiency is likely to improve if you keep that "flattened list" and only do the conversion once.

But the separate problem of that flattened list possibly being much larger than necessary is more interesting. What you probably want to build instead is a list of accumulated probabilities. It would be the same size as the input list, and in this case, it would contain 5, 25, and 100, respectively. The list is always monotonic, since none of your probabilities can legally be negative.

Anyway, once you have that new list, you pick a random number in the range from 0 to newlist[-1], and do a binary search of the newlist for the first value > your random value.

And if the list is small, you can just scan through it linearly, which is much less code.

DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to