[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-29 Thread Miles Cranmer
My pleasure! Cheers, Miles ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-29 Thread Miles Cranmer
So, this new method is in fact a hash table as discussed in that blog post. However, because it assumes integer arrays, we can go even further than that blog, and simply use `np.arange(ar_min, ar_max + 1)` is the "hash table". Thus, you don't actually need to use a hashing function at all, you c

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Regarding 2., did you have a particular approach in mind? This new lookup table method is already O(n) scaling (similar to a counting sort), so I cannot fathom a method that, as you suggest, would get significantly better performance for integer arrays. The sorting here is "free" in some sense s

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Ah, I did not clarify this: `kind="table"` will *also* return a sorted array. It simply does not use a sorting algorithm to get to it. This is because the table is generated using `np.arange` (i.e., already sorted) which is then masked. ___ NumPy-Discu

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Thanks for the comments Ralf! > You cannot switch the default behavior, that will break backwards > compatibility. The default `kind=None` have no effect on input/output behavior of the function. The only changes a user will see are in terms of speed and memory usage. `unique` will select this

[Numpy-discussion] Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Dear all, There is a PR that adds a lookup table approach to `unique`, shown below. You can get up to ~16x speedup for large integer arrays, at the cost of potentially greater memory usage. https://github.com/numpy/numpy/pull/21843 This is controlled by a new `kind` parameter, which is describ

Re: [Numpy-discussion] Performance feature for np.isin and np.in1d

2018-10-09 Thread Miles Cranmer
Hi, I was wondering how I could have this PR merged ( https://github.com/numpy/numpy/pull/12065)? The discussion on the PR seems to have gone well and all tests pass. Cheers, Miles On Mon, Oct 1, 2018 at 2:36 PM Miles Cranmer wrote: > (Not sure what the right list is for this) > >

[Numpy-discussion] Fwd: Performance feature for np.isin and np.in1d

2018-10-01 Thread Miles Cranmer
(Not sure what the right list is for this) Hi, I have started a PR for a "fast_integers" flag for np.isin and np.in1d which greatly increases performance when both arrays are integral. It works by creating a boolean array with elements set to 1 where the parent array (ar2) has elements and 0 othe