Oops, On Wed, Jun 29, 2011 at 8:32 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Wed, Jun 29, 2011 at 6:22 PM, Mark Wiebe <mwwi...@gmail.com> wrote: >> On Wed, Jun 29, 2011 at 8:20 AM, Lluís <xscr...@gmx.net> wrote: >>> >>> Matthew Brett writes: >>> >>> >> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys >>> >> the idea that the entry is still there, but we're just ignoring it. Of >>> >> course, that goes against common convention, but it might be easier to >>> >> explain. >>> >>> > I think Nathaniel's point is that np.IGNORE is a different idea than >>> > np.NA, and that is why joining the implementations can lead to >>> > conceptual confusion. >>> >>> This is how I see it: >>> >>> >>> a = np.array([0, 1, 2], dtype=int) >>> >>> a[0] = np.NA >>> ValueError >>> >>> e = np.array([np.NA, 1, 2], dtype=int) >>> ValueError >>> >>> b = np.array([np.NA, 1, 2], dtype=np.maybe(int)) >>> >>> m = np.array([np.NA, 1, 2], dtype=int, masked=True) >>> >>> bm = np.array([np.NA, 1, 2], dtype=np.maybe(int), masked=True) >>> >>> b[1] = np.NA >>> >>> np.sum(b) >>> np.NA >>> >>> np.sum(b, skipna=True) >>> 2 >>> >>> b.mask >>> None >>> >>> m[1] = np.NA >>> >>> np.sum(m) >>> 2 >>> >>> np.sum(m, skipna=True) >>> 2 >>> >>> m.mask >>> [False, False, True] >>> >>> bm[1] = np.NA >>> >>> np.sum(bm) >>> 2 >>> >>> np.sum(bm, skipna=True) >>> 2 >>> >>> bm.mask >>> [False, False, True] >>> >>> So: >>> >>> * Mask takes precedence over bit pattern on element assignment. There's >>> still the question of how to assign a bit pattern NA when the mask is >>> active. >>> >>> * When using mask, elements are automagically skipped. >>> >>> * "m[1] = np.NA" is equivalent to "m.mask[1] = False" >>> >>> * When using bit pattern + mask, it might make sense to have the initial >>> values as bit-pattern NAs, instead of masked (i.e., "bm.mask == [True, >>> False, True]" and "np.sum(bm) == np.NA") >> >> There seems to be a general idea that masks and NA bit patterns imply >> particular differing semantics, something which I think is simply false. > > Well - first - it's helpful surely to separate the concepts and the > implementation. > > Concepts / use patterns (as delineated by Nathaniel): > A) missing values == 'np.NA' in my emails. Can we call that CMV > (concept missing values)? > B) masks == np.IGNORE in my emails . CMSK (concept masks)? > > Implementations > 1) bit-pattern == na-dtype - how about we call that IBP > (implementation bit patten)? > 2) array.mask. IM (implementation mask)? > > Nathaniel implied that: > > CMV implies: sum([np.NA, 1]) == np.NA > CMSK implies sum([np.NA, 1]) == 1 > > and indeed, that's how R and masked arrays respectively behave. So I > think it's reasonable to say that at least R thought that the bitmask > implied the first and Pierre and others thought the mask meant the > second. > > The NEP as it stands thinks of CMV and and CM as being different views > of the same thing, Please correct me if I'm wrong. > >> Both NaN and Inf are implemented in hardware with the same idea as the NA >> bit pattern, but they do not follow NA missing value semantics. > > Right - and that doesn't affect the argument, because the argument is > about the concepts and not the implementation. > >> As far as I can tell, the only required difference between them is that NA >> bit patterns must destroy the data. Nothing else. > > I think Nathaniel's point was about the expected default behavior in > the different concepts. > >> Everything on top of that >> is a choice of API and interface mechanisms. I want them to behave exactly >> the same except for that necessary difference, so that it will be possible >> to use the *exact same Python code* with either approach. > > Right. And Nathaniel's point is that that desire leads to fusion of > the two ideas into one when they should be separated. For example, if > I understand correctly: > >>>> a = np.array([1.0, 2.0, 3, 7.0], masked=True) >>>> b = np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]') >>>> a[3] = np.NA # actual real hand-on-heart assignment >>>> b[3] = np.NA # magic mask setting although it looks the same
I meant: >>> a = np.array([1.0, 2.0, 3.0, 7.0], masked=True) >>> b = np.array([1.0, 2.0, 3.0, 7.0], dtype='NA[f8]') >>> b[3] = np.NA # actual real hand-on-heart assignment >>> a[3] = np.NA # magic mask setting although it looks the same Sorry, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion