Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Olivier Delalleau
Note that if you are ok with an approximate solution, and you can assume your data is somewhat shuffled, a simple online algorithm that uses no memory consists in: - choosing a small step size delta - initializing your percentile p to a more or less random value (a meaningful guess is better though

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
thanks for your responses, because of the size of the dataset I will still end up with the memory error if I calculate the median for each file, additionally the files are not all the same size. I believe this memory problem will still arise with the cumulative distribution calculation and not sure

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Brett Olsen
On Tue, Jan 24, 2012 at 6:22 PM, questions anon wrote: > I need some help understanding how to loop through many arrays to calculate > the 95th percentile. > I can easily do this by using numpy.concatenate to make one big array and > then finding the 95th percentile using numpy.percentile but this

Re: [Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread Marc Shivers
This is probably not the best way to do it, but I think it would work: Your could take two passes through your data, first calculating and storing the median for each file and the number of elements in each file. From those data, you can get a lower bound on the 95th percentile of the combined da

[Numpy-discussion] numpy.percentile multiple arrays

2012-01-24 Thread questions anon
I need some help understanding how to loop through many arrays to calculate the 95th percentile. I can easily do this by using numpy.concatenate to make one big array and then finding the 95th percentile using numpy.percentile but this causes a memory error when I want to run this on 100's of netcd