On Mon, Jun 27, 2011 at 8:55 AM, Mark Wiebe <mwwi...@gmail.com> wrote: > First I'd like to thank everyone for all the feedback you're providing, > clearly this is an important topic to many people, and the discussion has > helped clarify the ideas for me. I've renamed and updated the NEP, then > placed it into the master NumPy repository so it has a more permanent home > here: > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > In the NEP, I've tried to address everything that was raised in the original > thread and in Nathaniel's followup 'Concepts' thread. To deal with the issue > of whether a mask is True or False for a missing value, I've removed the > 'mask' attribute entirely, except for ufunc-like functions np.ismissing and > np.isavail which return the two styles of masks. Here's a high level summary > of how I'm thinking of the topic, and what I will implement: > Missing Data Abstraction > There appear to be two useful ways to think about missing data that are > worth supporting. > 1) Unknown yet existing data > 2) Data that doesn't exist > In 1), an NA value causes outputs to become NA except in a small number of > exceptions such as boolean logic, and in 2), operations treat the data as if > there were a smaller array without the NA values. > Temporarily Ignoring Data > In some cases, it is useful to flag data as NA temporarily, possibly in > several different ways, for particular calculations or testing out different > ways of throwing away outliers. This is independent of the missing data > abstraction, still requiring a choice of 1) or 2) above. > Implementation Techniques > There are two mechanisms generally used to implement missing data > abstractions, > 1) An NA bit pattern > 2) A mask > I've described a design in the NEP which can include both techniques using > the same interface. The mask approach is strictly more general than the NA > bit pattern approach, except for a few things like the idea of supporting > the dtype 'NA[f8,InfNan]' which you can read about in the NEP. > My intention is to implement the mask-based design, and possibly also > implement the NA bit pattern design, but if anything gets cut it will be the > NA bit patterns. > Thanks again for all your input so far, and thanks in advance for your > suggestions for improving this new revision of the NEP.
I'm trying to understand this part of the missing data NEP: "While numpy.NA works to mask values, it does not itself have a dtype. This means that returning the numpy.NA singleton from an operation like 'arr[0]' would be throwing away the dtype, which is still valuable to retain, so 'arr[0]' will return a zero-dimensional array either with its value masked, or containing the NA bit pattern for the array's dtype." If I do something like this in Cython: cdef np.float64_t ai for i in range(n): ai = a[i] ... Then I need to specify the type of ai, say float64 as above. What happens when a[i] is np.NA? Is ai still a float64? If NA is a bit pattern taken from float64 then a[i] could be float64, but if it is a 0d array then it would not be float64 and I assume I would run into problems or have to cast. So what does all this mean for iterating over each element of an array in Cython or C? Would I need to check the mask of element i first and only assign to ai if the mask is True (meaning not missing)? _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion