Hi, On Wed, Jun 29, 2011 at 12:39 AM, Mark Wiebe <mwwi...@gmail.com> wrote: > On Tue, Jun 28, 2011 at 5:20 PM, Matthew Brett <matthew.br...@gmail.com> > wrote: >> >> Hi, >> >> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith <n...@pobox.com> wrote: >> ... >> > (You might think, what difference does it make if you *can* unmask an >> > item? Us missing data folks could just ignore this feature. But: >> > whatever we end up implementing is something that I will have to >> > explain over and over to different people, most of them not >> > particularly sophisticated programmers. And there's just no sensible >> > way to explain this idea that if you store some particular value, then >> > it replaces the old value, but if you store NA, then the old value is >> > still there. >> >> Ouch - yes. No question, that is difficult to explain. Well, I >> think the explanation might go like this: >> >> "Ah, yes, well, that's because in fact numpy records missing values by >> using a 'mask'. So when you say `a[3] = np.NA', what you mean is, >> 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`" >> >> Is that fair? > > My favorite way of explaining it would be to have a grid of numbers written > on paper, then have several cardboards with holes poked in them in different > configurations. Placing these cardboard masks in front of the grid would > show different sets of non-missing data, without affecting the values stored > on the paper behind them.
Right - but here of course you are trying to explain the mask, and this is Nathaniel's point, that in order to explain NAs, you have to explain masks, and so, even at a basic level, the fusion of the two ideas is obvious, and already confusing. I mean this: a[3] = np.NA "Oh, so you just set the a[3] value to have some missing value code?" "Ah - no - in fact what I did was set a associated mask in position a[3] so that you can't any longer see the previous value of a[3]" "Huh. You mean I have a mask for every single value in order to be able to blank out a[3]? It looks like an assignment. I mean, it looks just like a[3] = 4. But I guess it isn't?" "Er..." I think Nathaniel's point is a very good one - these are separate ideas, np.NA and np.IGNORE, and a joint implementation is bound to draw them together in the mind of the user. Apart from anything else, the user has to know that, if they want a single NA value in an array, they have to add a mask size array.shape in bytes. They have to know then, that NA is implemented by masking, and then the 'NA for free by adding masking' idea breaks down and starts to feel like a kludge. The counter argument is of course that, in time, the implementation of NA with masking will seem as obvious and intuitive, as, say, broadcasting, and that we are just reacting from lack of experience with the new API. Of course, that does happen, but here, unless I am mistaken, the primary drive to fuse NA and masking is because of ease of implementation. That doesn't necessarily mean that they don't go together - if something is easy to implement, sometimes it means it will also feel natural in use, but at least we might say that there is some risk of the implementation driving the API, and that that can lead to problems. See you, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion