This is probably not the best way to do it, but I think it would work: Your could take two passes through your data, first calculating and storing the median for each file and the number of elements in each file. From those data, you can get a lower bound on the 95th percentile of the combined dataset. For example, if all the files are the same size, and you've got 100 of them, then the 95th percentile of the full dataset would be at least as large as the 90th percentile of the individual file median values. Once you've got that cut-off value, go back through your files and just pull out the values larger than your cut-off value. Then you'd just need to figure out what percentile in this subset would correspond to the 95th percentile in the full dataset.
HTH, Marc On Tue, Jan 24, 2012 at 7:22 PM, questions anon <questions.a...@gmail.com>wrote: > I need some help understanding how to loop through many arrays to > calculate the 95th percentile. > I can easily do this by using numpy.concatenate to make one big array and > then finding the 95th percentile using numpy.percentile but this causes a > memory error when I want to run this on 100's of netcdf files (see code > below). > Any alternative methods will be greatly appreciated. > > > all_TSFC=[] > for (path, dirs, files) in os.walk(MainFolder): > for dir in dirs: > print dir > path=path+'/' > for ncfile in files: > if ncfile[-3:]=='.nc': > print "dealing with ncfiles:", ncfile > ncfile=os.path.join(path,ncfile) > ncfile=Dataset(ncfile, 'r+', 'NETCDF4') > TSFC=ncfile.variables['T_SFC'][:] > ncfile.close() > all_TSFC.append(TSFC) > > big_array=N.ma.concatenate(all_TSFC) > Percentile95th=N.percentile(big_array, 95, axis=0) > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion