Hi Ralf,

> So I think the relevant choices are:
> 1. Change nothing to the current status quo (and possibly direct end users 
> who need more than
> what we offer now to `marray`)
> 2. Add a keyword to reductions
> 3. Add a single factory function that turns regular reductions into nan-aware 
> ones (as in
> https://github.com/data-apis/array-api/issues/621#issuecomment-1553481118)
>
> I think (1) is also a very reasonable outcome if we don't like any of the 
> alternatives. 

I am fine with (1), continue to dislike (2), and like (3).

On (1) [status quo], you mentioned that nanptp was rejected earlier as a
new addition to nanfunctions.  If this was because we didn't want to
expand the main numpy namespace (reasonable!), might a sub-option be to
allow expansion in nanfunctions for any regular function in the numpy
namespace, but only expose them in nanfunctions itself?  An advantage
would be that, effectively, those who like to omit NaN could just do
"import numpy.lib.nanfunctions as np".  Of course, at that point perhaps
one should just bite the bullet and move nanfunctions out to its own
package...

On (2) [keyword argument], I continue to dislike the idea of adding new
keyword arguments for the ufunc reductions -- ufuncs are one of the few
bits of numpy API that are really nicely clean and consistent between
many functions.  We have been very careful about extending it, and
keeping it light.  They already allow `np.sum(data, where=~isnan(data)`,
it is not obvious why we would add another option to do the same thing.
Obviously, one could argue that np.sum != np.add.reduce, so their
signatures can diverge, but I'd personally like to move in the opposite
direction (if only for speed for small arrays).

On (3) [factory function], I think a side benefit is that it is the
lightest possible way to make useful what is required anyway, creating
wrappers/implementations for functions not yet covered by nanfunctions.
My suggestion of a nan-as-omit Array API compatible wrapper class would
need them, and so would extending nanfunctions to cover more cases.
Indeed, it would even help the keyword-argument case as it would provide
working implementations.

Let me also mention again another option, of a wrapper data type which
translates floats with NaN to a floats with nan replaced by an
appropriate constant (identify from reductions by default).  To
opt in, one would do something like,

function(array.astype(NaNOmittingFloat), ...)

But really one could initialize arrays like that and just keep working
with them.  Of course, this would rely completely on Sebastian's custom
dtype mechanism, which has already proven its worth in StringDType, but
which would likely not be recognized by other array classes.  For that,
a custom array class would be best (though given marray that may
actually not be much work at all -- just need to have the mask always
inferred instead of kept as a separate array).

All the best,

Marten

p.s.  I liked the little summary of what other languages do in
https://github.com/data-apis/array-api/issues/621#issuecomment-1569485778
Julia's seemed a nice functional approach -- it seems a very interesting
language in general, from which it is probably worth getting more ideas...
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to