Hi, On Tue, Jun 28, 2011 at 11:40 PM, Jason Grout <jason-s...@creativetrax.com> wrote: > On 6/28/11 5:20 PM, Matthew Brett wrote: >> Hi, >> >> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith<n...@pobox.com> wrote: >> ... >>> (You might think, what difference does it make if you *can* unmask an >>> item? Us missing data folks could just ignore this feature. But: >>> whatever we end up implementing is something that I will have to >>> explain over and over to different people, most of them not >>> particularly sophisticated programmers. And there's just no sensible >>> way to explain this idea that if you store some particular value, then >>> it replaces the old value, but if you store NA, then the old value is >>> still there. >> >> Ouch - yes. No question, that is difficult to explain. Well, I >> think the explanation might go like this: >> >> "Ah, yes, well, that's because in fact numpy records missing values by >> using a 'mask'. So when you say `a[3] = np.NA', what you mean is, >> 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`" >> >> Is that fair? > > Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys > the idea that the entry is still there, but we're just ignoring it. Of > course, that goes against common convention, but it might be easier to > explain.
I think Nathaniel's point is that np.IGNORE is a different idea than np.NA, and that is why joining the implementations can lead to conceptual confusion. For example, for: a = np.array([np.NA, 1]) you might expect the result of a.sum() to be np.NA. That's what it is in R. However for: b = np.array([np.IGNORE, 1]) you'd probably expect b.sum() to be 1. That's what it is for masked_array currently. The current proposal fuses these two ideas with one implementation. Quoting from the NEP: >>> a = np.array([1., 3., np.NA, 7.], masked=True) >>> np.sum(a) array(NA, dtype='<f8', masked=True) >>> np.sum(a, skipna=True) 11.0 I agree with Nathaniel, that there is no practical way of avoiding the full 'NAs are in fact values where theres a False in the mask' concept, and that does impose a serious conceptual cost on the 'NA' user. Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion