Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: > > Even for binary, there are pathological cases, e.g. 1) reading a random > > subset of nearly all rows. 2) reading a single column when rows are > > small. In case 2 you will only go this route in the first place if you > > need to save memory. The user should be aware of these issues. > > FWIW, this route actually doesn't save any memory as compared to np.memmap.
Actually, for numpy.memmap you will read the whole file if you try to grab a single column and read a large fraction of the rows. Here is an example that will end up pulling the entire file into memory mm=numpy.memmap(fname, dtype=dtype) rows=numpy.arange(mm.size) x=mm['x'][rows] I just tested this on a 3G binary file and I'm sitting at 3G memory usage. I believe this is because numpy.memmap only understands rows. I don't fully understand the reason for that, but I suspect it is related to the fact that the ndarray really only has a concept of itemsize, and the fields are really just a reinterpretation of those bytes. It may be that one could tweak the ndarray code to get around this. But I would appreciate enlightenment on this subject. This fact was the original motivator for writing my code; the text reading ability came later. > Cool. I'm just a little concerned that, since we seem to have like... > 5 different implementations of this stuff all being worked on at the > same time, we need to get some consensus on which features actually > matter, so they can be melded together into the Single Best File > Reader Evar. An interface where indexing and file-reading are combined > is significantly more complicated than one where the core file-reading > inner-loop can ignore indexing. So far I'm not sure why this > complexity would be worthwhile, so that's what I'm trying to > understand. I think I've addressed the reason why the low level C code was written. And I think a unified, high level interface to binary and text files, which the Recfile class provides, is worthwhile. Can you please say more about "...one where the core file-reading inner-loop can ignore indexing"? I didn't catch the meaning. -e > > Cheers, > -- Nathaniel > > > Also, for some crazy ascii files we may want to revert to pure python > > anyway, but I think these should be special cases that can be flagged > > at runtime through keyword arguments to the python functions. > > > > BTW, did you mean to go off-list? > > > > cheers, > > > > -e > > -- > > Erin Scott Sheldon > > Brookhaven National Laboratory -- Erin Scott Sheldon Brookhaven National Laboratory _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion