On Thu, May 3, 2012 at 2:50 PM, Robert Elsner <[email protected]> wrote: > > Am 03.05.2012 15:45, schrieb Robert Kern: >> On Thu, May 3, 2012 at 2:24 PM, Robert Elsner <[email protected]> wrote: >>> Hello Everybody, >>> >>> is there any news on the status of np.bincount with respect to "big" >>> numbers? It seems I have just been bitten by #225. Is there an efficient >>> way around? I found the np.histogram function painfully slow. >>> >>> Below a simple script, that demonstrates bincount failing with a memory >>> error on big numbers >>> >>> import numpy as np >>> >>> x = np.array((30e9,)).astype(int) >>> np.bincount(x) >>> >>> >>> Any good idea how to work around it. My arrays contain somewhat 50M >>> entries in the range from 0 to 30e9. And I would like to have them >>> bincounted... >> >> You need a sparse data structure, then. Are you sure you even have >> duplicates? >> >> Anyways, I won't work out all of the details, but let me sketch >> something that might get you your answers. First, sort your array. >> Then use np.not_equal(x[:-1], x[1:]) as a mask on np.arange(1,len(x)) >> to find the indices where each sorted value changes over to the next. >> The np.diff() of that should give you the size of each. Use np.unique >> to get the sorted unique values to match up with those sizes. >> >> Fixing all of the off-by-one errors and dealing with the boundary >> conditions correctly is left as an exercise for the reader. >> > > ?? I suspect that this mail was meant to end up in the thread about > sparse array data?
No, I am responding to you. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
