Hi, On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root <[email protected]> wrote: > > > On Thursday, October 27, 2011, Charles R Harris <[email protected]> > wrote: >> >> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant <[email protected]> >> wrote: >>> >>> That is a pretty good explanation. I find myself convinced by Matthew's >>> arguments. I think that being able to separate ABSENT from IGNORED is a >>> good idea. I also like being able to control SKIP and PROPAGATE (but I >>> think the current implementation allows this already). >>> >>> What is the counter-argument to this proposal? >>> >> >> What exactly do you find convincing? The current masks propagate by >> default: >> >> In [1]: a = ones(5, maskna=1) >> >> In [2]: a[2] = NA >> >> In [3]: a >> Out[3]: array([ 1., 1., NA, 1., 1.]) >> >> In [4]: a + 1 >> Out[4]: array([ 2., 2., NA, 2., 2.]) >> >> In [5]: a[2] = 10 >> >> In [5]: a >> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) >> >> >> I don't see an essential difference between the implementation using masks >> and one using bit patterns, the mask when attached to the original array >> just adds a bit pattern by extending all the types by one byte, an approach >> that easily extends to all existing and future types, which is why Mark went >> that way for the first implementation given the time available. The masks >> are hidden because folks wanted something that behaved more like R and also >> because of the desire to combine the missing, ignore, and later possibly bit >> patterns in a unified manner. Note that the pseudo assignment was also meant >> to look like R. Adding true bit patterns to numpy isn't trivial and I >> believe Mark was thinking of parametrized types for that. >> >> The main problems I see with masks are unified storage and possibly memory >> use. The rest is just behavor and desired API and that can be adjusted >> within the current implementation. There is nothing essentially masky about >> masks. >> >> Chuck >> >> > > I think chuck sums it up quite nicely. The implementation detail about > using mask versus bit patterns can still be discussed and addressed. > Personally, I just don't see how parameterized dtypes would be easier to use > than the pseudo assignment. > > The elegance of mark's solution was to consider the treatment of missing > data in a unified manner. This puts missing data in a more prominent spot > for extension builders, which should greatly improve support throughout the > ecosystem.
Are extension builders then required to use the numpy C API to get their data? Speaking as an extension builder, I would rather you gave me the mask and the bitpattern information and let me do that myself. > By letting there be a single missing data framework (instead of > two) all that users need to figure out is when they want nan-like behavior > (propagate) or to be more like masks (skip). Numpy takes care of the rest. > There is a reason why I like using masked arrays because I don't have to > use nansum in my library functions to guard against the possibility of > receiving nans. Duck-typing is a good thing. > > My argument against separating IGNORE and PROPAGATE is that it becomes too > tempting to want to mix these in an array, but the desired behavior would > likely become ambiguous.. > > There is one other proplem that I just thought of that I don't think has > been outlined in either NEP. What if I perform an operation between an > array set up with propagate NAs and an array with skip NAs? These are explicitly covered in the alterNEP: https://gist.github.com/1056379/ Best, Matthew _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
