date:20220623

[Numpy-discussion] MERGED: Speeding up in1d and adding a "method" or similar

2022-06-23 Thread Sebastian Berg

Hi all,

just a note that I merged the PR with the following semantics:

A new `kind` keyword-only argument:
* `kind=None` uses a memory bound based heuristic to decide
  which method to use
* `kind="table"` uses the new approach (integer arrays only)
* `kind="sort"` forces the old behavior

The new documentation is available at:
https://numpy.org/devdocs/reference/generated/numpy.in1d.html

It seems this addition should be useful in many cases, but if you have
any concern about the choice of API please comment!

Cheers,

Sebastian


On Thu, 2022-06-16 at 06:08 -0700, Sebastian Berg wrote:
> Hi all,
> 
> there is a PR to add a faster path to `np.isin`, that uses a look-up-
> table for all the elements that are included in the haystack
> (`test_elements`):
> 
>     https://github.com/numpy/numpy/pull/12065/files
> 
> Such a table means that the memory overhead can be very significant,
> but the speedup as well, so there was the idea of adding an option to
> pick which version is used.
> 
> The current documentation for this new `method` keyword argument
> would
> be.  So the main questions are:
> 
> * Is there any concern about adding such a new kwarg?
> * Is `method` the best name?  Sorts uses `kind` which may also be
> good
> 
> There is also the smaller question of what heuristic 'auto' would
> use,
> but that can be tweaked at any time.
> 
> ```
>    method : {'auto', 'sort', 'dictionary'}, optional
>  The algorithm to use. This will not affect the final result,
>  but will affect the speed. Default is 'auto'.
> 
>  - If 'sort', will use a mergesort-based approach. This will
> have
>    a memory usage of roughly 6 times the sum of the sizes of
>    `ar1` and `ar2`, not accounting for size of dtypes.
>  - If 'dictionary', will use a key-dictionary approach
> similar
>    to a counting sort. This is only available for boolean and
>    integer arrays. This will have a memory usage of the
>    size of `ar1` plus the max-min value of `ar2`. This tends
>    to be the faster method if the following formula is true:
>    `log10(len(ar2)) > (log10(max(ar2)-min(ar2)) - 2.27) /
> 0.927`,
>    but may use greater memory.
>  - If 'auto', will automatically choose the method which is
>    expected to perform the fastest, using the above
>    formula. For larger sizes or smaller range,
>    'dictionary' is chosen. For larger range or smaller
>    sizes, 'sort' is chosen.`
> ```
> 
> Cheers,
> 
> Sebastian
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] StackOverflow Developer Survey

2022-06-23 Thread Michael Siebert

Hi all,

just found some more survey data on Numpy, see (need to scroll down a little to 
„Other frameworks and libraries“):

https://survey.stackoverflow.co/2022

Numpy seems to enjoy an exceptional position as a library over a wide spectrum 
of programming languages: rank 2 overall and among professionals and even rank 
1 with beginners.

Well deserved!

Best, Michael___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] MERGED: Speeding up in1d and adding a "method" or similar

[Numpy-discussion] StackOverflow Developer Survey

2 matches

Site Navigation

Mail list logo

Footer information