It appears to me that one of the biggest reason some of us have been talking
past each other in the discussions is that different people have different
definitions for the terms being used. Until this is thoroughly cleared up, I
feel the design process is tilting at windmills.

In the interests of clarity in our discussions, here is a starting point
which is consistent with the NEP. These definitions have been added in a
glossary within the NEP. If there are any ideas for amendments to these
definitions that we can agree on, I will update the NEP with those
amendments. Also, if I missed any important terms which need to be added,
please propose definitions for them.

NA (Not Available)
    A placeholder for a value which is unknown to computations. That
    value may be temporarily hidden with a mask, may have been lost
    due to hard drive corruption, or gone for any number of reasons.
    This is the same as NA in the R project.

IGNORE (Skip/Ignore)
    A placeholder which should be treated by computations as if no value
does
    or could exist there. For sums, this means act as if the value
    were zero, and for products, this means act as if the value were one.
    It's as if the array were compressed in some fashion to not include
    that element.

bitpattern
    A technique for implementing either NA or IGNORE, where a particular
    set of bit patterns are chosen from all the possible bit patterns of the
    value's data type to signal that the element is NA or IGNORE.

mask
    A technique for implementing either NA or IGNORE, where a
    boolean or enum array parallel to the data array is used to signal
    which elements are NA or IGNORE.

numpy.ma
    The existing implementation of a particular form of masked arrays,
    which is part of the NumPy codebase.


The most important distinctions I'm trying to draw are:

1) NA vs IGNORE and bitpattern vs mask are completely independent. Any
combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and
IGNORE as mask are reasonable.

2) The idea of masking and the numpy.ma implementation are different. The
numpy.ma object makes particular choices about how to interpret the mask,
but while backwards compatibility is important, a fresh evaluation of all
the design choices going into a mask implementation is worthwhile.

Thanks,
Mark
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to