I agree with @Daniele's point, storing huge arrays in text files migh indicate a bad process.... but once these functions can be improved, why not? Unless this turns to be a burden to change.
Regarding the estimation of the array size, I don't see a big performance loss when the file iterator is exhausting once more in order to estimate the number of rows and pre-allocate the proper arrays to avoid using list of lists. The hardest part seems to be dealing with arrays of strings (perhaps easily solved with dtype=object) and structured arrays. Cheers, Saullo 2014-10-26 18:00 GMT+01:00 <numpy-discussion-requ...@scipy.org>: > Send NumPy-Discussion mailing list submissions to > numpy-discussion@scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-requ...@scipy.org > > You can reach the person managing the list at > numpy-discussion-ow...@scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Memory efficient alternative for np.loadtxt and > np.genfromtxt (Daniele Nicolodi) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Oct 2014 17:42:32 +0100 > From: Daniele Nicolodi <dani...@grinta.net> > Subject: Re: [Numpy-discussion] Memory efficient alternative for > np.loadtxt and np.genfromtxt > To: numpy-discussion@scipy.org > Message-ID: <544d2478.8020...@grinta.net> > Content-Type: text/plain; charset=windows-1252 > > On 26/10/14 09:46, Saullo Castro wrote: > > I would like to start working on a memory efficient alternative for > > np.loadtxt and np.genfromtxt that uses arrays instead of lists to store > > the data while the file iterator is exhausted. > > ... > > > I would be glad if you could share your experience on this matter. > > I'm of the opinion that if your workflow requires you to regularly load > large arrays from text files, something else needs to be fixed rather > than the numpy speed and memory usage in reading data from text files. > > There are a number of data formats that are interoperable and allow to > store data much more efficiently. hdf5 is one natural choice, maybe with > the blosc compressor. > > Cheers, > Daniele > > > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 97, Issue 57 > ************************************************ >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion