Im not sure why the memory doubling is necessary. Isnt it possible to
preallocate the arrays and write to them? I suppose this might be
inefficient though, in case you end up reading only a small subset of rows
out of a mostly corrupt file? But that seems to be a rather uncommon corner
case.

Either way, id say a doubling of memory use is fair game for numpy.
Generality is more important than absolute performance. The most important
thing is that temporary python datastructures are avoided. That shouldn't
be too hard to accomplish, and would realize most of the performance and
memory gains, I imagine.

On Sun, Oct 26, 2014 at 12:54 PM, Jeff Reback <jeffreb...@gmail.com> wrote:

> you should have a read here/
> http://wesmckinney.com/blog/?p=543
>
> going below the 2x memory usage on read in is non trivial and costly in
> terms of performance
>
> On Oct 26, 2014, at 4:46 AM, Saullo Castro <saullogiov...@gmail.com>
> wrote:
>
> I would like to start working on a memory efficient alternative for
> np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the
> data while the file iterator is exhausted.
>
> The motivation came from this SO question:
>
> http://stackoverflow.com/q/26569852/832621
>
> where for huge arrays the current NumPy ASCII readers are really slow and
> require ~6 times more memory. This case I tested with Pandas' read_csv()
> and it required 2 times more memory.
>
> I would be glad if you could share your experience on this matter.
>
> Greetings,
> Saullo
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to