[Numpy-discussion] Speeding up isin1d and adding a "method" or similar

2022-06-16 Thread Sebastian Berg
Hi all,

there is a PR to add a faster path to `np.isin`, that uses a look-up-
table for all the elements that are included in the haystack
(`test_elements`):

https://github.com/numpy/numpy/pull/12065/files

Such a table means that the memory overhead can be very significant,
but the speedup as well, so there was the idea of adding an option to
pick which version is used.

The current documentation for this new `method` keyword argument would
be.  So the main questions are:

* Is there any concern about adding such a new kwarg?
* Is `method` the best name?  Sorts uses `kind` which may also be good

There is also the smaller question of what heuristic 'auto' would use,
but that can be tweaked at any time.

```
   method : {'auto', 'sort', 'dictionary'}, optional
 The algorithm to use. This will not affect the final result,
 but will affect the speed. Default is 'auto'.

 - If 'sort', will use a mergesort-based approach. This will have
   a memory usage of roughly 6 times the sum of the sizes of
   `ar1` and `ar2`, not accounting for size of dtypes.
 - If 'dictionary', will use a key-dictionary approach similar
   to a counting sort. This is only available for boolean and
   integer arrays. This will have a memory usage of the
   size of `ar1` plus the max-min value of `ar2`. This tends
   to be the faster method if the following formula is true:
   `log10(len(ar2)) > (log10(max(ar2)-min(ar2)) - 2.27) / 0.927`,
   but may use greater memory.
 - If 'auto', will automatically choose the method which is
   expected to perform the fastest, using the above
   formula. For larger sizes or smaller range,
   'dictionary' is chosen. For larger range or smaller
   sizes, 'sort' is chosen.`
```

Cheers,

Sebastian


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] argmin/argmax underspecification?

2022-06-16 Thread TODD ANDERSON
The docs say "Returns the indices of the maximum values along an axis."  When 
axis=None, I presume the implementation is actually "Returns the first index 
(assuming ravel ordering) of the maximum value."  I also presume that people 
are now relying on the fact that it always returns the first such index.  So, 
maybe the docs should be updated?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: assert_array_equal scalar broadcast behaviour

2022-06-16 Thread Matti Picus


On 16/6/22 19:28, Jon Morris wrote:


Hello all,

I was recently tripped up by issue #9542 
, where a call to 
assert_array_equal unexpectedly passed because a single scalar can be 
declared equal to an array if every value in the array is the same. 
I’ve created pull #21595  
to address the problem – what does everyone think? Is this an 
acceptable solution or is there a better way to resolve this issue?


Many thanks,

Jon

Jon Morris

Software Developer



The proposed solution, to add a "strict" kwarg to assert_array_equal, 
looks good to me. Thanks!


Matti

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com