The python list structure stores the length of the list already (it
increments / decrements with appends / pops, etc.), so you'd be
*re*computing a value that you already have.

Yup, it does. I was thinking of using each thread to get the len( ) of each
sub-list in parallel so I don't have to go through the entire list to get
the length of each sub-list sequentially.

I think that it would be best at this point for you to implement both
and profile the two implementations to compare runtimes.  My
suggestion would be to implement the python-side wrangling first, and
time that vs. my <10 line algo above (I suspect that just the
wrangling will be slower than my solution, much less any call to
CUDA), then add in the CUDA code after that if it still seems like
it's going to be a performance win.

Yes, more of empirical tests and then tweaking. Thanks again.

Best regards,

./francis
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to