Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-20 Thread Gagandeep Singh
Hi,

I have recently joined the mailing list and have gone through the previous
discussions on this thread. I would like to share my analysis (advantages
and disadvantages) of three possible alternatives (Enum, String, boolean)
to support the proposed feature.

*Enum*

Advantages

1. Compatibility - Enums (currently, `np.CopyMode`) can be added to support
the never copy feature without breaking any code which uses NumPy. Current
values for `copy` arguments are `True` and `False` which can be easily
mapped to two members of the above enum and the code will keep working as
it used to be. Considering the large user base of NumPy, I think this is
the most significant point to be considered.
2. Clarity and Consistency - Enums inherently provide consistency i.e., all
the values of the copy argument will be of the same type and hence, one
wouldn't have to worry much about using some special values just for the
sake of prohibiting a deep copy. Also, Enums make the intention clear
(np.CopyMode.ALWAYS, etc. already reflect the expected behaviour). Booleans
like True and False are a bit cryptic in nature. In fact, the current
behaviour of False is also a bit confusing. Enums can help in doing away
with this issue without breaking anything which uses previous NumPy
versions.
3. Code will break loudly - If anyone will try to use `np.CopyMode` on a
previous version then the code will break loudly (AttributeError) rather
than doing unpredictable things silently (fixing these is much more
painful, especially in large code bases than updating the version).

Disadvantages

1. Polluting Namespace - Enums do pollute the global namespace. Maybe it's
an unavoidable thing which comes with the usage of Enums.
2. Inconsistent with APIs where strings are used - Many NumPy API use
strings for supporting various options for an argument. For example,
`np.linalg.qr` accepts strings for different modes. I think this would be
the first time (if it happens) for an Enum to be used in such a scenario.

*Strings*

Advantages

1. Consistent with other NumPy APIs - As I said above, strings will keep
things consistent across NumPy.
2. Clarity and Consistency - Strings too provide clarity of intention
regarding the behaviour of the code. If we support strings for all the
cases of copy argument then it would be consistent as well.
3. No pollution of namespace.

I am not sure but supporting strings and booleans at least in new NumPy
versions should be possible though doing that would not be as easy as Enums.

Disadvantages

1. Silent and Unpredictable behaviour on previous NumPy versions - Since,
strings can be interpreted as Booleans internally, if anyone passes any
non-empty string, it will map to `True` and hence the code will always do a
deep copy, irrespective of the argument. So, there would be cases, when
this thing will go unnoticed by the user, the unwanted consequences of
which I think shouldn't be ignored while making a choice for this feature.

*Boolean (True/False/None)*

Advantages

1. Easy to extend - As of now True and False are already supported. None
can additionally be used to support never copy.

Disadvantages

1. Silent behaviour in case of None and False - If someone passes None to
some previous NumPy version then it may behave as False. Hence no error
would be raised, but yeah the copy will be made only if needed.
2. Cryptic - The intention is not clearly reflected in these three values
(in fact False is a bit relaxed in nature i.e., instead of never doing a
copy it does only if needed which should have been the case with None).

*Summary*

To the best of my understanding, I think Booleans are not a good option
when compared to String and Enums. Now, the choice is whether we are okay
with unpredictable behaviour of user code in case of strings to reject
Enums or we are okay with pollution of namespace to easily support previous
API without breaking anything for future versions.

Please let me know if I missed any important points. Thanks.


On Mon, Jun 21, 2021 at 8:33 AM Stefan van der Walt 
wrote:

> On Sun, Jun 20, 2021, at 18:53, Charles R Harris wrote:
>
>
> On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt 
> wrote:
>
>
> On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
>
> This happens all the time. Even if we make copy='never' an error *today*,
> users will be encountering existing versions of NumPy for years into the
> future, so we won't be able to change the behavior of copy='never' for a
> very long time. Our deprecation policy says we would need to wait at least
> one year for this, but frankly I'm not sure that's enough for
> the possibility of silent bugs. 3-4 years might be more realistic.
>
>
> If we go the enum route, we may just as well deprecate string arguments at
> the same time so that we have the flexibility to introduce them again in
> the future.
>
>
> That makes sense to me, but I think this would not preclude the enum from
> being introduced right now. If we make this change, the enum will b

Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-23 Thread Gagandeep Singh
To me, adding enums as attributes of the `np.copy` function seems like a
pretty good idea. This trick might resolve the only relatively important
issue with Enums. Then, the benefits of Enum might outweigh the
disadvantage of uncommon of usage of Enums in NumPy APIs. As an end user, I
would like Enums rather than strings as the former would provide fixed
number of choices (hence, easy debugging) as compared to the latter (in
which case, infinite choices for passing strings and the code may work
silently, imagine, passing, `if_neded` instead of `if_needed` and it
working perfectly fine (silently). This thing has happened to me while
using another library.

On Thu, Jun 24, 2021 at 8:05 AM Benjamin Root  wrote:

> Why not both? The definition of the enum might live in a proper namespace
> location, but I see no reason why `np.copy.IF_NEEDED =
> np.flags.CopyFlgs.IF_NEEDED` can't be done (I mean, adding the enum members
> as attributes to the `np.copy()` function). Seems perfectly reasonable to
> me, and reads pretty nicely, too. It isn't like we are dropping support for
> the booleans, so those are still around for easy typing.
>
> Ben Root
>
> On Wed, Jun 23, 2021 at 10:26 PM Stefan van der Walt 
> wrote:
>
>> On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
>> > Personally I was a fan of the Enum approach. People dislike it because
>> > it is not “Pythonic”, but imho that is an accident of history because
>> > Enums only appeared (iirc) in Python 3.4. In fact, they are the right
>> > data structure for this particular problem, so for my money we should
>> > *make it* Pythonic by starting to use it everywhere where we have a
>> > finite list of choices.
>>
>> The enum definitely feels like the right abstraction. But the resulting
>> API is clunky because of naming and top-level scarcity.
>>
>> Hence the suggestion to tag it onto np.copy, but there is an argument to
>> be made for consistency by placing all enums under np.flags or similar.
>>
>> Still, np.flags.copy.IF_NEEDED gets long.
>>
>> Stéfan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `keepdims=True` for argmin/argmx and C-API `PyArray_ArgMaxWithKeepdims`

2021-07-01 Thread Gagandeep Singh
Hi,

So should I remove these new functions from public C-API? Let me know. I
will do that.

On Thu, 1 Jul, 2021, 10:02 pm Sebastian Berg, 
wrote:

> On Thu, 2021-07-01 at 00:39 -0700, Stefan van der Walt wrote:
> > Hi Sebastian,
> >
> > On Wed, Jun 30, 2021, at 18:23, Sebastian Berg wrote:
> > > The PR https://github.com/numpy/numpy/pull/19211 proposes to extend
> > > argmin and argmax with a `keepdims=False` keyword-only argument.
> >
> > This seems consistent with existing APIs, so I'm not concerned.
> >
> > For those wondering, `keepdims` preserves the number of dimensions of
> > the original array in a reduction operation like `sum`:
> >
> > In [1]: X = np.random.random((10, 15))
> >
> > In [2]: np.sum(X).shape
> > Out[2]: ()
> >
> > In [3]: np.sum(X, keepdims=True).shape
> > Out[3]: (1, 1)
> >
> > This is sometimes useful for broadcasting.
> >
> > > The PR  also proposes to add:
> > >
> > > * `PyArray_ArgMinWithKeepdims`
> > > * `PyArray_ArgMaxWithKeepdims`
> >
> > I am curious whether this is our general pattern for adding keyword
> > argument functionality to functions in the C-API.  It seems a bit
> > excessive!
>
> True, I am now tending a bit towards delaying this until someone
> actually asks for it...
> In most use-cases just using the Python API is likely only a small
> overhead anyway if done right.
>
> I do not think we have a pattern.  We do have some functions with the
> pattern of `With...And...` to allow signatures of different complexity.
> But very few of this type of python additions ever made it into the C-
> API.  For `Reshape`, `order=` was added by introducing `NewShape`.
>
> I have some hope that very long-term, HPy might solve this for us...
>
> Cheers,
>
> Sebastian
>
>
>
> >
> > Stéfan
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] `keepdims=True` for argmin/argmx and C-API `PyArray_ArgMaxWithKeepdims`

2021-07-01 Thread Gagandeep Singh
Hi,

I have removed the two new C functions
<https://github.com/numpy/numpy/pull/19211/commits/4be86dd0400d4f52fd72dbc312d69942fd5f7c73>
from public C-API. Let me know if anything else is needed.

Thanks.

On Fri, Jul 2, 2021 at 2:10 AM Matti Picus  wrote:

>
> On 1/7/21 7:49 pm, Gagandeep Singh wrote:
> > Hi,
> >
> > So should I remove these new functions from public C-API? Let me know.
> > I will do that.
> >
> >
>
> Yes please. If needed we can add them, but once in we cannot remove them.
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion