On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < [email protected]> wrote:
> > On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > > > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < > [email protected]> wrote: > > > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < > [email protected]> wrote: > > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions > written for this? Shouldn't we reuse them? Perhaps it's overkill, and > perhaps it will reintroduce the 'transposed' problem? > > > > > > Yes, good point, one could replace the > > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > > but for the ndmin=2 case, we'd need to replace > > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > > not sure which solution is more efficient in terms of memory access > etc... > > > > > > Cheers, > > > Derek > > > > > > > > > I can confirm that the current behavior is not sufficient for all of > the original corner cases that ndmin was supposed to address. Keep in mind > that np.loadtxt takes a one-column data file and a one-row data file down to > the same shape. I don't see how the current code is able to produce the > correct array shape when ndmin=2. Do we have some sort of counter in > loadtxt for counting the number of rows and columns read? Could we use > those to help guide the ndmin=2 case? > > > > > > I think that using atleast_1d(X) might be a bit overkill, but it would > be very clear as to the code's intent. I don't think we have to worry about > memory usage if we limit its use to only situations where ndmin is greater > than the number of dimensions of the array. In those cases, the array is > either an empty result, a scalar value (in which memory access is trivial), > or 1-d (in which a transpose is cheap). > > > > What if one does things the other way around - avoid calling squeeze > until _after_ doing the atleast_Nd() magic? That way the row/column > information should be conserved, right? Also, we avoid transposing, memory > use, ... > > > > Oh, and someone could conceivably have a _looong_ 1D file, but would want > it read as a 2D array. > > > > Paul > > > > > > > > @Derek, good catch with noticing the error in the tests. We do still need > to handle the case I mentioned, however. I have attached an example script > to demonstrate the issue. In this script, I would expect the second-to-last > array to be a shape of (1, 5). I believe that the single-row, multi-column > case would actually be the more common type of edge-case encountered by > users than the others. Therefore, I believe that this ndmin fix is not > adequate until this is addressed. > > > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That > would just undo whatever we had just done. Also, wrt the transpose, a (1, > 100000) array looks the same in memory as a (100000, 1) array, right? > Agree. I thought more along the lines of (pseudocode-ish) > if ndmin == 0: > squeeze() > if ndmin == 1: > atleast_1D() > elif ndmin == 2: > atleast_2D() > else: > I don't rightly know what would go here, maybe raise ValueError? > > That would avoid the squeeze call before the atleast_Nd magic. But the code > was changed, so I think my comment doesn't make sense anymore. It's probably > fine the way it is! > > Paul > > I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Ben Root
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
