On 06/29/2011 07:38 PM, Mark Wiebe wrote: > On Wed, Jun 29, 2011 at 9:35 AM, Dag Sverre Seljebotn > <d.s.seljeb...@astro.uio.no <mailto:d.s.seljeb...@astro.uio.no>> wrote: > > On 06/29/2011 03:45 PM, Matthew Brett wrote: > > Hi, > > > > On Wed, Jun 29, 2011 at 12:39 AM, Mark Wiebe<mwwi...@gmail.com > <mailto:mwwi...@gmail.com>> wrote: > >> On Tue, Jun 28, 2011 at 5:20 PM, Matthew > Brett<matthew.br...@gmail.com <mailto:matthew.br...@gmail.com>> > >> wrote: > >>> > >>> Hi, > >>> > >>> On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith<n...@pobox.com > <mailto:n...@pobox.com>> wrote: > >>> ... > >>>> (You might think, what difference does it make if you *can* > unmask an > >>>> item? Us missing data folks could just ignore this feature. But: > >>>> whatever we end up implementing is something that I will have to > >>>> explain over and over to different people, most of them not > >>>> particularly sophisticated programmers. And there's just no > sensible > >>>> way to explain this idea that if you store some particular > value, then > >>>> it replaces the old value, but if you store NA, then the old > value is > >>>> still there. > >>> > >>> Ouch - yes. No question, that is difficult to explain. Well, I > >>> think the explanation might go like this: > >>> > >>> "Ah, yes, well, that's because in fact numpy records missing > values by > >>> using a 'mask'. So when you say `a[3] = np.NA', what you mean is, > >>> 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask[3] = False`" > >>> > >>> Is that fair? > >> > >> My favorite way of explaining it would be to have a grid of > numbers written > >> on paper, then have several cardboards with holes poked in them > in different > >> configurations. Placing these cardboard masks in front of the > grid would > >> show different sets of non-missing data, without affecting the > values stored > >> on the paper behind them. > > > > Right - but here of course you are trying to explain the mask, and > > this is Nathaniel's point, that in order to explain NAs, you have to > > explain masks, and so, even at a basic level, the fusion of the two > > ideas is obvious, and already confusing. I mean this: > > > > a[3] = np.NA > > > > "Oh, so you just set the a[3] value to have some missing value code?" > > > > "Ah - no - in fact what I did was set a associated mask in position > > a[3] so that you can't any longer see the previous value of a[3]" > > > > "Huh. You mean I have a mask for every single value in order to be > > able to blank out a[3]? It looks like an assignment. I mean, it > > looks just like a[3] = 4. But I guess it isn't?" > > > > "Er..." > > > > I think Nathaniel's point is a very good one - these are separate > > ideas, np.NA and np.IGNORE, and a joint implementation is bound to > > draw them together in the mind of the user. Apart from anything > > else, the user has to know that, if they want a single NA value in an > > array, they have to add a mask size array.shape in bytes. They have > > to know then, that NA is implemented by masking, and then the 'NA for > > free by adding masking' idea breaks down and starts to feel like a > > kludge. > > > > The counter argument is of course that, in time, the > implementation of > > NA with masking will seem as obvious and intuitive, as, say, > > broadcasting, and that we are just reacting from lack of experience > > with the new API. > > However, no matter how used we get to this, people coming from almost > any other tool (in particular R) will keep think it is > counter-intuitive. Why set up a major semantic incompatability that > people then have to overcome in order to start using NumPy. > > > I'm not aware of a semantic incompatibility. I believe R doesn't support > views like NumPy does, so the things you have to do to see masking > semantics aren't even possible in R.
Well, whether the same feature is possible or not in R is irrelevant to whether a semantic incompatability would exist. Views themselves are a *major* semantic incompatability, and are highly confusing at first to MATLAB/Fortran/R people. However they have major advantages outweighing the disadvantage of having to caution new users. But there's simply no precedence anywhere for an assignment that doesn't erase the old value for a particular input value, and the advantages seem pretty minor (well, I think it is ugly in its own right, but that is besides the point...) Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion