Note that if you are ok with an approximate solution, and you can assume
your data is somewhat shuffled, a simple online algorithm that uses no
memory consists in:
- choosing a small step size delta
- initializing your percentile p to a more or less random value (a
meaningful guess is better though
thanks for your responses,
because of the size of the dataset I will still end up with the memory
error if I calculate the median for each file, additionally the files are
not all the same size. I believe this memory problem will still arise with
the cumulative distribution calculation and not sure
On Tue, Jan 24, 2012 at 6:22 PM, questions anon
wrote:
> I need some help understanding how to loop through many arrays to calculate
> the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this
This is probably not the best way to do it, but I think it would work:
Your could take two passes through your data, first calculating and storing
the median for each file and the number of elements in each file. From
those data, you can get a lower bound on the 95th percentile of the
combined da
I need some help understanding how to loop through many arrays to calculate
the 95th percentile.
I can easily do this by using numpy.concatenate to make one big array and
then finding the 95th percentile using numpy.percentile but this causes a
memory error when I want to run this on 100's of netcd