Matthew, Dag, +1. On Jun 29, 2011 4:35 PM, "Dag Sverre Seljebotn" <d.s.seljeb...@astro.uio.no> wrote: > On 06/29/2011 03:45 PM, Matthew Brett wrote: >> Hi, >> >> On Wed, Jun 29, 2011 at 12:39 AM, Mark Wiebe<mwwi...@gmail.com> wrote: >>> On Tue, Jun 28, 2011 at 5:20 PM, Matthew Brett<matthew.br...@gmail.com> >>> wrote: >>>> >>>> Hi, >>>> >>>> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith<n...@pobox.com> wrote: >>>> ... >>>>> (You might think, what difference does it make if you *can* unmask an >>>>> item? Us missing data folks could just ignore this feature. But: >>>>> whatever we end up implementing is something that I will have to >>>>> explain over and over to different people, most of them not >>>>> particularly sophisticated programmers. And there's just no sensible >>>>> way to explain this idea that if you store some particular value, then >>>>> it replaces the old value, but if you store NA, then the old value is >>>>> still there. >>>> >>>> Ouch - yes. No question, that is difficult to explain. Well, I >>>> think the explanation might go like this: >>>> >>>> "Ah, yes, well, that's because in fact numpy records missing values by >>>> using a 'mask'. So when you say `a[3] = np.NA', what you mean is, >>>> 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`" >>>> >>>> Is that fair? >>> >>> My favorite way of explaining it would be to have a grid of numbers written >>> on paper, then have several cardboards with holes poked in them in different >>> configurations. Placing these cardboard masks in front of the grid would >>> show different sets of non-missing data, without affecting the values stored >>> on the paper behind them. >> >> Right - but here of course you are trying to explain the mask, and >> this is Nathaniel's point, that in order to explain NAs, you have to >> explain masks, and so, even at a basic level, the fusion of the two >> ideas is obvious, and already confusing. I mean this: >> >> a[3] = np.NA >> >> "Oh, so you just set the a[3] value to have some missing value code?" >> >> "Ah - no - in fact what I did was set a associated mask in position >> a[3] so that you can't any longer see the previous value of a[3]" >> >> "Huh. You mean I have a mask for every single value in order to be >> able to blank out a[3]? It looks like an assignment. I mean, it >> looks just like a[3] = 4. But I guess it isn't?" >> >> "Er..." >> >> I think Nathaniel's point is a very good one - these are separate >> ideas, np.NA and np.IGNORE, and a joint implementation is bound to >> draw them together in the mind of the user. Apart from anything >> else, the user has to know that, if they want a single NA value in an >> array, they have to add a mask size array.shape in bytes. They have >> to know then, that NA is implemented by masking, and then the 'NA for >> free by adding masking' idea breaks down and starts to feel like a >> kludge. >> >> The counter argument is of course that, in time, the implementation of >> NA with masking will seem as obvious and intuitive, as, say, >> broadcasting, and that we are just reacting from lack of experience >> with the new API. > > However, no matter how used we get to this, people coming from almost > any other tool (in particular R) will keep think it is > counter-intuitive. Why set up a major semantic incompatability that > people then have to overcome in order to start using NumPy. > > I really don't see what's wrong with some more explicit API like > a.mask[3] = True. "Explicit is better than implicit". > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion