Luke Paireepinart wrote:
Can you explain what your function is doing and also post some test code to
profile it?
On Sat, Feb 20, 2010 at 10:22 AM, Andrew Fithian <afit...@gmail.com> wrote:
Hi tutor,
I'm have a statistical bootstrapping script that is bottlenecking on a
python function sample_with_replacement(). I wrote this function myself
because I couldn't find a similar function in python's random library. This
is the fastest version of the function I could come up with (I used
cProfile.run() to time every version I wrote) but it's not fast enough, can
you help me speed it up even more?
import random
def sample_with_replacement(list):
l = len(list) # the sample needs to be as long as list
r = xrange(l)
_random = random.random
return [list[int(_random()*l)] for i in r] # using
list[int(_random()*l)] is faster than random.choice(list)
FWIW, my bootstrapping script is spending roughly half of the run time in
sample_with_replacement() much more than any other function or method.
Thanks in advance for any advice you can give me.
-Drew
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
list and l are poor names for locals. The former because it's a
built-in type, and the latter because it looks too much like a 1.
You don't say how big these lists are, but I'll assume they're large
enough that the extra time spent creating the 'l' and 'r' variables is
irrelevant.
I suspect you could gain some speed by using random.randrange instead of
multiplying random.random by the length.
And depending on how the caller is using the data, you might gain some
by returning a generator expression instead of a list. Certainly you
could reduce the memory footprint.
I wonder why you assume the output list has to be the same size as the
input list. Since you're sampling with replacement, you're not using
the whole list anyway. So I'd have defined the function to take a
second argument, the length of desired array. And if you could accept a
generator instead of a list, you don't care how long it is, so let it be
infinite.
(untested)
def sample(mylist):
mylistlen = len(mylist)
randrange = random.randrange
while True:
yield mylist[ randrange(0, mylistlen)]
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor