Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-31 Thread Jonathan Fine
Here's my opinion, as a bit of an outsider. Mainly, I understand MAX to
mean the largest value in a finite totally ordered set. I understand TOP to
mean the 'best' member of a finite set.

For example, on a mountain each point has a HEIGHT. There will be a MAX
HEIGHT. The point(s) on the mountain that is the highest is the SUMMIT. Or
in other words the TOP of the mountain. Or another example, there are TOP
40 charts for music. https://www.officialcharts.com/

To summarize, use MAX for the largest value in a totally ordered set. Use
TOP when you have a height (or similar) function applied to an unordered
set. The highest temperature in 2021 will occur on the hottest day(s). One
is a temperature, the other a date.

I'm an outsider, and I've not made an effort to gain special knowledge
about the domain prior to posting this opinion. I hope it helps. Please
ignore it if it does not.

-- 
Jonathan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-31 Thread Ralf Gommers
On Sun, May 30, 2021 at 10:01 AM Matti Picus  wrote:

>
>
> Did this function come up at all in the array-API consortium dicussions?
>

It happens to be in this list of functions which was made last week:
https://github.com/data-apis/array-api/issues/187. That list is potential
next candidates, based on them being implemented in most but not all
libraries. There was no real discussion on `topk` specifically though.

The current version of the array API standard basically contains
functionality that is either common to all libraries, or that NumPy has and
most other libraries have as well. Given how much harder it is to get
functions into NumPy than in other libraries, the "most libraries have it,
NumPy does not" set of functions was not investigated much yet. That's also
the reason NEP 47 doesn't have any new functions to be added to NumPy
except for `from_dlpack`, but only consistency changes like adding keepdims
keywords, stacking for linalg functions that are missing that, etc.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] EHN: Discusions about 'add numpy.topk'

2021-05-31 Thread Ralf Gommers
On Sun, May 30, 2021 at 10:41 AM  wrote:

> >
> >
> > On Fri, May 28, 2021 at 4:58 PM  > > wrote:
> >
> > Hi all,
> >
> > Finding topk elements is widely used in several fields, but missed
> > in NumPy.
> > I implement this functionality named as  numpy.topk using core numpy
> > functions and open a PR:
> >
> > https://github.com/numpy/numpy/pull/19117
> > 
> >
> > Any discussion are welcome.
> >
> >
> > Thanks for the proposal Kang. I think this functionality is indeed a
> > fairly obvious gap in what Numpy offers, and would make sense to add.
> > A detailed comparison with other libraries would be very helpful here.
> > TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and
> > MXNet call it `topk`.
> >
> > Two things to look at in more detail here are:
> > 1. complete signatures of the function in each of those libraries, and
> > what the commonality is there.
> > 2. the argument Eric made on your PR about consistency with
> > sort/argsort, and if we want topk/argtopk? Also, do other libraries
> > have `argtopk`?
> >
> > Cheers,
> > Ralf
> >
> >
> > Best wishes,
> >
> > Kang Kai
> >
>
> Hi, Thanks for reply, I present some details below:
>

Thanks for the detailed investigation Kang!


>
> ## 1. complete signatures of the function in each of those libraries, and 
> what the commonality is there.
>
>
> | Library | Name   | arg1  | arg2 | arg3 | arg4  | arg5   
> |
>
> |-||---|--|--|---||
> | NumPy [1
> ]   | numpy.topk | a | k| axis | largest   | sorted |
> | PyTorch [2
> ] | torch.topk | input | k| dim  | largest   | sorted |
> | R [3
> ]   | topK   | x | K| /| / | /  |
> | MXNet [4
> ]   | mxnet.npx.topk | data  | k| axis | is_ascend | /  |
> | CNTK [5
> ]| cntk.ops.top_k | x | k| axis | / | /  |
> | TF [6
> ]  | tf.math.top_k  | input | k| /| / | sorted |
> | Dask [7
> ]| dask.array.topk| a | k| axis | -k| /  |
> | Dask [8
> ]| dask.array.argtopk | a | k| axis | -k| /  |
> | MATLAB [9
> ]  | mink   | A | k| dim  | / | /  |
> | MATLAB [10
> ] | maxk   | A | k| dim  | / | /  |
>
>
> | Library | Name   | Returns |
> |-||-|
> | NumPy [1]   | numpy.topk | values, indices |
> | PyTorch [2] | torch.topk | values, indices |
> | R [3]   | topK   | indices |
> | MXNet [4]   | mxnet.npx.topk | controls by ret_typ |
> | CNTK [5]| cntk.ops.top_k | values, indices |
> | TF [6]  | tf.math.top_k  | values, indices |
> | Dask [7]| dask.array.topk| values  |
> | Dask [8]| dask.array.argtopk | indices |
> | MATLAB [9]  | mink   | values, indices |
> | MATLAB [10] | maxk   | values, indices |
>
> - arg1: Input array.
> - arg2: Number of top elements to look for along the given axis.
> - arg3: Axis along which to find topk.
> - R only supports vector, TensorFlow only supports axis=-1.
> - arg4: Controls whether to return k largest or smallest elements.
> - R, CNTK and TensorFlow only return k largest elements.
> -
>  In Dask, k can be negative, which means to return k smallest elements.
> - In MATLAB, use two distinct functions.
> - arg5: If true the resulting k elements will be sorted by the values.
> - R, MXNet, CNTK, Dask and MATLAB only return sorted elements.
>
> **Summary**:
> - Function Name: could be `topk`, `top_k`, `mink`/`maxk`.
> - arg1 (a), arg2 (k), arg3 (axis): should be required.
> - arg4 (largest), arg4 (sorted): might be discussed.
> - Returns: discussed below.
>
>
> ## 2. the argument Eric made on your PR about consistency with sort/argsort, 
> if we want topk/argtopk? Also, do other libraries have `argtopk`
>
> In most libraries, `topk` or `top_k` returns both values and indices, and
> `argtopk` is not included except for Dask. In addition, there is another
> inconsistency: `sort` returns ascending values, but `topk` returns
> descending values.
>
> ## Suggestions
> Finally, IMHO, new function signature might be designed as one of:
> I) use `topk` / `argtopk` or `top_k` / `argtop_k`
> ```python
> def topk(a, k, axis=-1, sorted=True) -> topk_values
> def argtopk(a, k, axis=-1, sorted=True) -> topk_indices
> ```
> or
> ```python
> def top_k(a, k, axis=-1, sorted=True) -> topk_values
> def argtop_k(a, k, axis=-1, sorted=True) -> topk_indices
> ```
> where `k` can be negative which means to return k smallest elements.
>

I don't think I'm a fan of the `-k` cleverness. Saying you want `-5` values
as a stand-in