Re: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary

Nathaniel Smith Wed, 06 Jul 2011 11:10:41 -0700

On Wed, Jul 6, 2011 at 6:12 AM, Dag Sverre Seljebotn
<[email protected]> wrote:
> What I'm saying is that Mark's proposal is more flexible. Say for the
> sake of the argument that I have two codes I need to interface with:
>
>  - Library A is written in Fortran and uses a seperate (explicit) mask
> array for NA
>
>  - Library B runs on a GPU and uses a bit pattern for NA

Have you ever encountered any such codes? I'm not aware of any code
outside of R that implements the proposed NA semantics -- esp. in
high-performance code, people generally want to avoid lots of
conditionals, and the proposed NA semantics require a branch around
every operation inside your inner loops.

Certainly there is code out there that uses NaNs, and code that uses
masks (in various ways that might or might not match the way the NEP
uses them). And it's easy to work with both from numpy right now. The
question is whether and how the core should add some tricky and subtle
semantics for a few very specific ways of handling NaN-like objects
and masking.

Upthread you also wrote:
> At least I feel that the transparency of NumPy is a huge part of its
> current success. Many more than me spend half their time in C/Fortran
> and half their time in Python.

It's exactly this transparency that worries Matthew and me -- we feel
that the alterNEP preserves it, and the NEP attempts to erase it. In
the NEP, there are two totally different underlying data structures,
but this difference is blurred at the Python level. The idea is that
you shouldn't have to think about which you have, but if you work with
C/Fortran, then of course you do have to be constantly aware of the
underlying implementation anyway. And operations which would obviously
make sense for the some of the objects that you know you're working
with (e.g., unmasking elements from a masked array, or even accessing
the mask directly using numpy slicing) are disallowed, specifically in
order to make this distinction harder to make.

According to the NEP, C code that takes a masked array should never
ever unmask any element; unmasking should only be done by making a
full copy of the mask, and attaching it to a new view taken from the
original array. Would you honestly feel obliged to follow this
requirement in your C code? Or would you just unmask elements in place
when it made sense, in order to save memory?

-- Nathaniel
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary

Reply via email to