Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Ralf Gommers
On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg 
wrote:

> On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers  > >
> > wrote:
> >
> > >
> > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > > wieser.eric+nu...@gmail.com>
> > > wrote:
> > >
> > > > Perhaps worth mentioning that we've discussed this sort of API
> > > > before, in
> > > > https://github.com/numpy/numpy/pull/11897.
> > > >
> > > > Under that proposal, the api would be something like:
> > > >
> > > > * `copy=True` - always copy, like it is today
> > > > * `copy=False` - copy if needed, like it is today
> > > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > > possible
> > > >
> > >
> > > There's a couple of issues I see with using `copy` for __array__:
> > > - copy is already weird (False doesn't mean no), and a [bool,
> > > some_obj_or_str] keyword isn't making that better
> > > - the behavior we're talking about can do more than copying, e.g.
> > > for
> > > PyTorch it would modify the autograd graph by adding detach(), and
> > > for
> > > sparse it's not just "make a copy" (which implies doubling memory
> > > use) but
> > > it densifies which can massively blow up the memory.
> > > - I'm -1 on adding things to the main namespace (never_copy) for
> > > something
> > > that can be handled differently (like a string, or a new keyword)
> > >
> > > tl;dr a new `force` keyword would be better
> > >
> >
> > I agree, “copy” is not a good description of this desired coercion
> > behavior.
> >
> > A new keyword argument like “force” would be much clearer.
> >
>
> That seems fine and practical. But, in the end it seems to me that the
> `force=` keyword, just means that some projects want to teach their
> users that:
>
> 1. `np.asarray()` can be expensive (and may always copy)
> 2. `np.asarray()` always loses type properties
>
> while others do not choose to teach about it.  There seems very little
> or even no "promise" attached to either `force=True` or `force=False`.
>
>
> In the end, the question is whether sparse will actually want to
> implement `force=True` if the main reason we add is for library use.
>

That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can
answer for the former here. For SciPy that should be decided on the
scipy-dev list, but my opinion would be: yes to adding __array__ that
raises TypeError by default, and converts with `force=True`.


> There is no difference between a visualization library or numpy. In
> both cases the users memory will blow up just the same.
>
> As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
> things, but:
>
> >>> torch.ones(10, requires_grad=True) + np.arange(10)
> # RuntimeError: Can't call numpy() on Variable that requires grad. Use
> var.detach().numpy() instead.
>
> So arguably, there is no type-safety concern due to `.detach()`.


I'm not sure what the question is here; no one mentioned type-safety. The
PyTorch maintainers have already said they're fine with adding a force
keyword.

There
> is an (obvious) general loss of type information that always occurs
> with an `np.asarray` call.
> But I do not see that creating any openings for bugs here, due to the
> wisdom of not allowing the above operation.
> In fact, it actually seems much worse for for xarray, or pandas. They
> do support the above operation and will potentially mess up if the
> arange was previously an xarray with a matching index, but different
> order.
>
>
> I am very much in favor of adding such things, but I still lack a bit
> of clarity as to whom we would be helping?
>

See Juan's first email. I personally am ambivalent on this proposal, but if
Juan and the Napari devs really want it, that's good enough for me.

Cheers,
Ralf



> If end-users will actually use `np.asarray(..., force=True)` over
> special methods, then great! But I am currently not sure the type-
> safety argument is all that big of a point.  And the performance or
> memory-blowup argument remains true even for visualization libraries
> (where the array is purely input and never output as such).
>
>
> But yes, "never copy" is a somewhat different extension to `__array__`
> and `np.asarray`. It guarantees high speed and in-place behaviour which
> is useful for different settings.
>
> - Sebastian
>
>
> >
> > > Cheers,
> > > Ralf
> > >
> > >
> > > > I think the discussion stalled on the precise spelling of the
> > > > third
> > > > option.
> > > >
> > > > `__array__` was not discussed there, but it seems like adding the
> > > > `copy`
> > > > argument to `__array__` would be a perfectly reasonable
> > > > extension.
> > > >
> > > > Eric
> > > >
> > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > j...@fastmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > One bit of expressivity we would miss is “copy if necessary,
> > > > > but
> > > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > 

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Hameer Abbasi
Hi! Yes, I would advocate for a `force=` kwarg but personally I don't think 
it's explicit enough, but probably as explicit as can be given NumPy's API.

Personally, I'd also raise a warning within PyData/Sparse and I hope it's in 
big bold letters in the docs in NumPy to be careful with this.

Get Outlook for iOS

From: NumPy-Discussion 
 on behalf of 
Ralf Gommers 
Sent: Tuesday, April 28, 2020 11:51:13 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to 
`__array__` interface



On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg 
mailto:sebast...@sipsolutions.net>> wrote:
On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers 
> mailto:ralf.gomm...@gmail.com>
> >
> wrote:
>
> >
> > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > wieser.eric+nu...@gmail.com>
> > wrote:
> >
> > > Perhaps worth mentioning that we've discussed this sort of API
> > > before, in
> > > https://github.com/numpy/numpy/pull/11897.
> > >
> > > Under that proposal, the api would be something like:
> > >
> > > * `copy=True` - always copy, like it is today
> > > * `copy=False` - copy if needed, like it is today
> > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > possible
> > >
> >
> > There's a couple of issues I see with using `copy` for __array__:
> > - copy is already weird (False doesn't mean no), and a [bool,
> > some_obj_or_str] keyword isn't making that better
> > - the behavior we're talking about can do more than copying, e.g.
> > for
> > PyTorch it would modify the autograd graph by adding detach(), and
> > for
> > sparse it's not just "make a copy" (which implies doubling memory
> > use) but
> > it densifies which can massively blow up the memory.
> > - I'm -1 on adding things to the main namespace (never_copy) for
> > something
> > that can be handled differently (like a string, or a new keyword)
> >
> > tl;dr a new `force` keyword would be better
> >
>
> I agree, “copy” is not a good description of this desired coercion
> behavior.
>
> A new keyword argument like “force” would be much clearer.
>

That seems fine and practical. But, in the end it seems to me that the
`force=` keyword, just means that some projects want to teach their
users that:

1. `np.asarray()` can be expensive (and may always copy)
2. `np.asarray()` always loses type properties

while others do not choose to teach about it.  There seems very little
or even no "promise" attached to either `force=True` or `force=False`.


In the end, the question is whether sparse will actually want to
implement `force=True` if the main reason we add is for library use.

That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can 
answer for the former here. For SciPy that should be decided on the scipy-dev 
list, but my opinion would be: yes to adding __array__ that raises TypeError by 
default, and converts with `force=True`.

There is no difference between a visualization library or numpy. In
both cases the users memory will blow up just the same.

As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
things, but:

>>> torch.ones(10, requires_grad=True) + np.arange(10)
# RuntimeError: Can't call numpy() on Variable that requires grad. Use
var.detach().numpy() instead.

So arguably, there is no type-safety concern due to `.detach()`.

I'm not sure what the question is here; no one mentioned type-safety. The 
PyTorch maintainers have already said they're fine with adding a force keyword.

There
is an (obvious) general loss of type information that always occurs
with an `np.asarray` call.
But I do not see that creating any openings for bugs here, due to the
wisdom of not allowing the above operation.
In fact, it actually seems much worse for for xarray, or pandas. They
do support the above operation and will potentially mess up if the
arange was previously an xarray with a matching index, but different
order.


I am very much in favor of adding such things, but I still lack a bit
of clarity as to whom we would be helping?

See Juan's first email. I personally am ambivalent on this proposal, but if 
Juan and the Napari devs really want it, that's good enough for me.

Cheers,
Ralf



If end-users will actually use `np.asarray(..., force=True)` over
special methods, then great! But I am currently not sure the type-
safety argument is all that big of a point.  And the performance or
memory-blowup argument remains true even for visualization libraries
(where the array is purely input and never output as such).


But yes, "never copy" is a somewhat different extension to `__array__`
and `np.asarray`. It guarantees high speed and in-place behaviour which
is useful for different settings.

- Sebastian


>
> > Cheers,
> > Ralf
> >
> >
> > > I think the discussion stalled on the precise spelling of the
> > > third
> > > optio

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Sebastian Berg
On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:

> > So arguably, there is no type-safety concern due to `.detach()`.
> 
> I'm not sure what the question is here; no one mentioned type-safety. 
> The
> PyTorch maintainers have already said they're fine with adding a
> force
> keyword.

But type-safety is the reason to distinguish between:

* np.asarrau(tensor)
* np.asarray(tensor, force=True)

Similar to:

* operator.index(obj)
* int(obj)   # convert less type-safe (strings, floats)!

I actually mentioned 3 reasons in my email:

1. Teach and Inform users (about the next two mainly)
2. Type-safety
3. Expensive conversion 

And only type-safety is related to `.detach()` mentioning that there
may not be clear story about the usage in that case.

(continued below)

> 

> > 
> > 
> > I am very much in favor of adding such things, but I still lack a
> > bit
> > of clarity as to whom we would be helping?
> > 
> 
> See Juan's first email. I personally am ambivalent on this proposal,
> but if
> Juan and the Napari devs really want it, that's good enough for me.

Of course I read it, twice, but it is only good enough for me if we
actually *solve the issue*, and for that I want to know which issue we
are solving :), it seems obvious, but I am not so sure...

That brings us to the other two reasons:

Teaching and Informing users:

If Napari uses `force=True` indiscriminately, it is not very clear to
the user about whether or not the operation is expensive.  I.e. the
user can learn it is when using `np.asarray(sparse_arr)` with other
libraries. But they are not notified that `napari.vis_func(sparse_arr)`
might kill their computer.

So the "Teaching" part can still partially work, but it does not inform
the user well anymore on whether or not a function will blow-up memory.

Expensive Conversion:

If the main reason is expensive conversions, however, than, as a
library I would probably just use it for half my API, since copying
from GPU to CPU will still be much faster than my own function.


Generally:

I want to help Napari, but it seems like there may be more to this, and
it may be good to finish these thoughts before making a call.

E.g. Napari wants to use it, but do the array-providers want Napari to
use it?

For sparse Hameer just mentioned that he still would want big warnings
both during the operation and in the `np.asarray` documentation.
If we put such big warnings there, we should have an idea of who we
want to ignore that warning? (Napari yes, sklearn sometimes, ...?)

   -> Is "whatever the library feels right" good enough?

And if the conversion still gives warnings for some array-objects, have
we actually gained much?

  -> Maybe we do, end-users may be happy to ignore those warnings...


The one clear use-case for `force=True` is the end-user. Just like no
library uses `int(obj)`, but end-users can use it very nicely.
I am happy to help the end-user in this case, but if that is the target
audience we may want to _discourage_ Napari from using `force=True` and
encourage sparse not to put any RuntimeWarnings on it!

- Sebastian


> Cheers,
> Ralf
> 
> 
> 
> > If end-users will actually use `np.asarray(..., force=True)` over
> > special methods, then great! But I am currently not sure the type-
> > safety argument is all that big of a point.  And the performance or
> > memory-blowup argument remains true even for visualization
> > libraries
> > (where the array is purely input and never output as such).
> > 
> > 
> > But yes, "never copy" is a somewhat different extension to
> > `__array__`
> > and `np.asarray`. It guarantees high speed and in-place behaviour
> > which
> > is useful for different settings.
> > 
> > - Sebastian
> > 
> > 
> > > > Cheers,
> > > > Ralf
> > > > 
> > > > 
> > > > > I think the discussion stalled on the precise spelling of the
> > > > > third
> > > > > option.
> > > > > 
> > > > > `__array__` was not discussed there, but it seems like adding
> > > > > the
> > > > > `copy`
> > > > > argument to `__array__` would be a perfectly reasonable
> > > > > extension.
> > > > > 
> > > > > Eric
> > > > > 
> > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > j...@fastmail.com>
> > > > > wrote:
> > > > > 
> > > > > > Hi everyone,
> > > > > > 
> > > > > > One bit of expressivity we would miss is “copy if
> > > > > > necessary,
> > > > > > but
> > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > this.
> > > > > > > 
> > > > > > 
> > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > up
> > > > > > with
> > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > library that we
> > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > > 
> > > > > > This sounds like the sort of thing that is use case driven.
> > > > > > If
> > > > > > enough
> > > > > > projects want to use it, then I have no objections to
> > > > > > adding
> > > > > > the keyword.
> > > > > > OTOH, we need

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-28 Thread Sebastian Berg
On Tue, 2020-04-28 at 09:58 -0500, Sebastian Berg wrote:
> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> 
> > > So arguably, there is no type-safety concern due to `.detach()`.
> > 
> > I'm not sure what the question is here; no one mentioned type-
> > safety. 
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
> 
> But type-safety is the reason to distinguish between:
> 
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
> 
> Similar to:
> 
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
> 
> I actually mentioned 3 reasons in my email:
> 
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion 
> 
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
> 

(Sorry something got broken here)

The question is what PyTorch's reasons are to feel `np.asarray(tensor)`
should not work generally.
I for one thought it was type-safety with regard to `.detach()`. And
then I was surprised to realize that type-safety might not be a great
reason to reject an implicit `.detach()` within `np.asarray(tensor)`.


In any case, all the long talk is simply that I first want to be clear
on what the concerns are why libraries reject `np.asarray(tensor)`.
And then, I want to be clear that adding `force=True` will actually
solves those concerns.
And I was surprised myself that this became very much unclear to me.

Again, one reason for it being not clear to me is half the ecosystem
could potentially can just always use `force=True`.  So there must be
some "good usage" and some "bad usage" and I would like to know what
that is.

- Sebastian


> (continued below)
> 
> 
> > > 
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > > 
> > 
> > See Juan's first email. I personally am ambivalent on this
> > proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
> 
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue
> we
> are solving :), it seems obvious, but I am not so sure...
> 
> That brings us to the other two reasons:
> 
> Teaching and Informing users:
> 
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that
> `napari.vis_func(sparse_arr)`
> might kill their computer.
> 
> So the "Teaching" part can still partially work, but it does not
> inform
> the user well anymore on whether or not a function will blow-up
> memory.
> 
> Expensive Conversion:
> 
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
> 
> 
> Generally:
> 
> I want to help Napari, but it seems like there may be more to this,
> and
> it may be good to finish these thoughts before making a call.
> 
> E.g. Napari wants to use it, but do the array-providers want Napari
> to
> use it?
> 
> For sparse Hameer just mentioned that he still would want big
> warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
> 
>-> Is "whatever the library feels right" good enough?
> 
> And if the conversion still gives warnings for some array-objects,
> have
> we actually gained much?
> 
>   -> Maybe we do, end-users may be happy to ignore those warnings...
> 
> 
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the
> target
> audience we may want to _discourage_ Napari from using `force=True`
> and
> encourage sparse not to put any RuntimeWarnings on it!
> 
> - Sebastian
> 
> 
> > Cheers,
> > Ralf
> > 
> > 
> > 
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the
> > > type-
> > > safety argument is all that big of a point.  And the performance
> > > or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > > 
> > > 
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > > 
> > > - Sebastian
> > > 
> > > 
> > > > > Cheers,
> > > > > Ralf
> > > > > 
> > > > > 
> > > > > > I think the disc

[Numpy-discussion] NumPy Community Meeting Wednesday

2020-04-28 Thread Sebastian Berg
Hi all,

There will be a NumPy Community meeting Wednesday April 29th at 1pm
Pacific Time (20:00 UTC). Everyone is invited and encouraged to join in
and edit the work-in-progress meeting topics and notes:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion