[Numpy-discussion] API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Stephan Kuschel via NumPy-Discussion

Dear Community,

For my own work, I required the intersect1d function to work on multiple 
arrays while returning the indices (using `return_indizes=True`). 
Consequently I changed the function in numpy and now I am seeking 
feedback from the community.


This is the corresponding PR: https://github.com/numpy/numpy/pull/25688

My motivation for the change may also apply to a larger group of people 
as it is important for lots of simulation data analysis:


In various simulations there is often the case that many entities 
(particles, cells, vehicles, whatever the simulation consists of) are 
being tracked throughout the simulation. A typical approach is to assign 
a unique ID to every entity which stays constant and unique throughout 
the simulation and is written together with other properties of the 
entities on every simulation snapshot in time. Note, that during the 
simulation new entities may enter or leave the simulation and due to 
parallelization the order of those entities is not conserved.
Tracking the position of entities over, lets say, 100 snapshots requires 
the intersection of 100 id lists instead of only two.


Consequently I changed the intersect1d function from
`intersect1d(ar1, ar2, assume_unique=False, return_indices=False)` to
`intersect1d(*ars, assume_unique=False, return_indices=False)`.

Please let me know if there is any interest in those changes -- be it in 
this form or another.


All the Best
Stephan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Marten van Kerkwijk
> For my own work, I required the intersect1d function to work on multiple 
> arrays while returning the indices (using `return_indizes=True`). 
> Consequently I changed the function in numpy and now I am seeking 
> feedback from the community.
>
> This is the corresponding PR: https://github.com/numpy/numpy/pull/25688



To me this looks like a very sensible generalization.  In terms of numpy
API, the only real change is that, effectively, the assume_unique and
return_indices arguments become keyword-only, i.e., in the unlikely case
that someone passed those as positional, a trivial backward-compatible
change will fix it.

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Dom Grigonis
Just curious, how much faster is it compared to currently recommended `reduce` 
approach?

DG

> On 2 Feb 2024, at 17:31, Marten van Kerkwijk  wrote:
> 
>> For my own work, I required the intersect1d function to work on multiple 
>> arrays while returning the indices (using `return_indizes=True`). 
>> Consequently I changed the function in numpy and now I am seeking 
>> feedback from the community.
>> 
>> This is the corresponding PR: https://github.com/numpy/numpy/pull/25688
> 
> 
> 
> To me this looks like a very sensible generalization.  In terms of numpy
> API, the only real change is that, effectively, the assume_unique and
> return_indices arguments become keyword-only, i.e., in the unlikely case
> that someone passed those as positional, a trivial backward-compatible
> change will fix it.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Dom Grigonis
Also, I don’t know if this could be of value, but my use case for this is to 
find overlaps, then split arrays into overlapping and non-overlapping segments.

Thus, it might be useful for `return_indices=True` to return indices of all 
instances, not only the first.

Also, in my case I need both overlapping and non-overlapping indices, but this 
would become ambiguous with more than 2 arrays.

If it was left with 2 array input, then it can be extended to return both 
overlapping and non-overlapping parts. I think it could be another potential 
path to consider.

E.g. what would be the speed comparison:
intr = intersect1d(arr1, arr2, assume_unique=False)
intr = intersect1d(intr, np.unique(arr3), assume_unique=True)

# VS new

intr = intersect1d(arr1, arr2, arr3, assume_unique=False)
Then, does the gain from such generalisation justify constriction it introduces?

Regards,
DG

> On 2 Feb 2024, at 17:31, Marten van Kerkwijk  wrote:
> 
>> For my own work, I required the intersect1d function to work on multiple 
>> arrays while returning the indices (using `return_indizes=True`). 
>> Consequently I changed the function in numpy and now I am seeking 
>> feedback from the community.
>> 
>> This is the corresponding PR: https://github.com/numpy/numpy/pull/25688
> 
> 
> 
> To me this looks like a very sensible generalization.  In terms of numpy
> API, the only real change is that, effectively, the assume_unique and
> return_indices arguments become keyword-only, i.e., in the unlikely case
> that someone passed those as positional, a trivial backward-compatible
> change will fix it.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: API: make numpy.lib._arraysetops.intersect1d work on multiple arrays #25688

2024-02-02 Thread Charles R Harris
On Fri, Feb 2, 2024 at 6:34 AM Stephan Kuschel via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> All the Best
>
Stephan
> ___
> NumPy-Discussion mailing li

Dear Community,
>
> For my own work, I required the intersect1d function to work on multiple
> arrays while returning the indices (using `return_indizes=True`).
> Consequently I changed the function in numpy and now I am seeking
> feedback from the community.
>
> This is the corresponding PR: https://github.com/numpy/numpy/pull/25688
>
> My motivation for the change may also apply to a larger group of people
> as it is important for lots of simulation data analysis:
>
> In various simulations there is often the case that many entities
> (particles, cells, vehicles, whatever the simulation consists of) are
> being tracked throughout the simulation. A typical approach is to assign
> a unique ID to every entity which stays constant and unique throughout
> the simulation and is written together with other properties of the
> entities on every simulation snapshot in time. Note, that during the
> simulation new entities may enter or leave the simulation and due to
> parallelization the order of those entities is not conserved.
> Tracking the position of entities over, lets say, 100 snapshots requires
> the intersection of 100 id lists instead of only two.
>
> Consequently I changed the intersect1d function from
> `intersect1d(ar1, ar2, assume_unique=False, return_indices=False)` to
> `intersect1d(*ars, assume_unique=False, return_indices=False)`.
>
> Please let me know if there is any interest in those changes -- be it in
> this form or another.
>
>
Seems reasonable. I don't know if it is faster, but NumPy is all about
vectorization.

Chuck
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com