On 26 Oct 2014 11:54, "Jeff Reback" <jeffreb...@gmail.com> wrote: > > you should have a read here/ > http://wesmckinney.com/blog/?p=543 > > going below the 2x memory usage on read in is non trivial and costly in terms of performance
On Linux you can probably go below 2x overhead easily, by exploiting the fact that realloc on large memory blocks is basically O(1) (yes really): http://blog.httrack.com/blog/2014/04/05/a-story-of-realloc-and-laziness/ Sadly osx does not provide anything similar and I can't tell for sure about windows. Though on further thought, the numbers Wes quotes there aren't actually the most informative - massif will tell you how much virtual memory you have allocated, but a lot of that is going to be a pure vm accounting trick. The output array memory will actually be allocated incrementally one block at a time as you fill it in. This means that if you can free each temporary chunk immediately after you copy it into the output array, then even simple approaches can have very low overhead. It's possible pandas's actual overhead is already closer to 1x than 2x, and this is just hidden by the tools Wes is using to measure it. -n > On Oct 26, 2014, at 4:46 AM, Saullo Castro <saullogiov...@gmail.com> wrote: > >> I would like to start working on a memory efficient alternative for np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the data while the file iterator is exhausted. >> >> The motivation came from this SO question: >> >> http://stackoverflow.com/q/26569852/832621 >> >> where for huge arrays the current NumPy ASCII readers are really slow and require ~6 times more memory. This case I tested with Pandas' read_csv() and it required 2 times more memory. >> >> I would be glad if you could share your experience on this matter. >> >> Greetings, >> Saullo >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion