Am 03.05.2012 15:45, schrieb Robert Kern: > On Thu, May 3, 2012 at 2:24 PM, Robert Elsner <[email protected]> wrote: >> Hello Everybody, >> >> is there any news on the status of np.bincount with respect to "big" >> numbers? It seems I have just been bitten by #225. Is there an efficient >> way around? I found the np.histogram function painfully slow. >> >> Below a simple script, that demonstrates bincount failing with a memory >> error on big numbers >> >> import numpy as np >> >> x = np.array((30e9,)).astype(int) >> np.bincount(x) >> >> >> Any good idea how to work around it. My arrays contain somewhat 50M >> entries in the range from 0 to 30e9. And I would like to have them >> bincounted... > > You need a sparse data structure, then. Are you sure you even have duplicates? > > Anyways, I won't work out all of the details, but let me sketch > something that might get you your answers. First, sort your array. > Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) > to find the indices where each sorted value changes over to the next. > The np.diff() of that should give you the size of each. Use np.unique > to get the sorted unique values to match up with those sizes. > > Fixing all of the off-by-one errors and dealing with the boundary > conditions correctly is left as an exercise for the reader. >
?? I suspect that this mail was meant to end up in the thread about sparse array data? _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
