[Numpy-discussion] Proposing a flattening functionality for deeply nested lists in NumPy

2024-12-30 Thread Mark via NumPy-Discussion
Hello all,


Many people have asked how to flatten a nested list into a one-dimensional
list (e.g., see this StackOverflow thread

). While flattening a 2D list is relatively straightforward, deeply nested
lists can become cumbersome to handle. To address this challenge, I propose
adding a built-in list-flattening functionality to NumPy.



By adding this feature to NumPy, the library would not only simplify a
frequently used task but also enhance its overall usability, making it an
even more powerful tool for data manipulation and scientific computing.



The code snippet below demonstrates how a nested list can be flattened,
enabling conversion into a NumPy array. I believe this would be a valuable
addition to the package. See also this issue
.


from collections.abc import Iterable



def flatten_list(iterable):



"""

Flatten a (nested) list into a one-dimensional list.



Parameters

--

iterable : iterable

The input collection.



Returns

---

flattened_list : list

A one-dimensional list containing all the elements from the input,

with any nested structures flattened.



Examples





Flattening a list containing nested lists:



>>> obj = [[1, 2, 3], [1, 2, 3]]

>>> flatten_list(obj)

[1, 2, 3, 1, 2, 3]



Flattening a list with sublists of different lengths:



>>> obj = [1, [7, 4], [8, 1, 5]]

>>> flatten_list(obj)

[1, 7, 4, 8, 1, 5]



Flattening a deeply nested list.



>>> obj = [1, [2], [[3]], [[[4]]],]

>>> flatten_list(obj)

[1, 2, 3, 4]



Flattening a list with various types of elements:



>>> obj = [1, [2], (3), (4,), {5}, np.array([1,2,3]), range(3), 'Hello']

>>> flatten_list(obj)

[1, 2, 3, 4, 5, 1, 2, 3, 0, 1, 2, 'Hello']



"""



if not isinstance(iterable, Iterable) or isinstance(iterable, str):

return [iterable]



def flatten_generator(iterable):



for item in iterable:



if isinstance(item, Iterable) and not isinstance(item, str):

yield from flatten_generator(item)



else:

yield item



return list(flatten_generator(iterable))
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] surprising behavior from array indexing

2024-12-30 Thread Mark Harfouche via NumPy-Discussion
Happy new year everybody!

I've been upgrading my code to start to support array indexing and in my
tests I found something that was well documented, but surprising to me.

I've tried to read through
https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing
and even after multiple passes, I still find it very terse...

Consider a mutli dimensional dataset:

import numpy as np
shape = (10, 20, 30)
original = np.arange(np.prod(shape)).reshape(shape)

Let's consider we want to collapse dim 0 to a single entry
Let's consider we want a subset from dim 1, with a slice
Let's consider that we want want 3 elements from dim 2

i = 2
j = slice(1, 6)
k = slice(7, 10)
out_basic = original[i, j, k]
assert out_basic.shape == (5, 3)

Now consider we want to provide freedom to have instead of a slice for k,
an arbitrary "array"

k = [7, 11, 13]
out_array = original[i, j, k]
assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}"

AssertionError: shape is actually (3, 5)

To get the result "Mark expects", one has to do it in two steps

integer_types = (int, np.integer)
integer_indexes = (
i if isinstance(i, integer_types) else slice(None),
j if isinstance(j, integer_types) else slice(None),
k if isinstance(k, integer_types) else slice(None),
)
non_integer_indexes = (
((i,) if not isinstance(i, integer_types) else ()) +
((j,) if not isinstance(j, integer_types) else ()) +
((k,) if not isinstance(k, integer_types) else ())
)
out_double_indexed = original [integer_indexes][non_integer_indexes]
assert out_double_indexed.shape == (5, 3), f"shape is actually
{out_double_indexed.shape}"

This is somewhat very surprising to me. I totally understand that things
won't change in terms of this kind of indexing in numpy, but is there a way
I can adjust my indexing strategy to regain the ability to slice into my
array in a "single shot".

The main usecase is for arrays that are truly huge, but chucked in ways
where slicing into them can be quite efficient. This multi-dimensional
imaging data. Each chunk is quite "huge" so this kind of metadata
manipulation is worthwhile to avoid unecessary IO.

Perhaps there is a "simple" distinction I am missing, for example using a
tuple for k instead of a list

Thanks for your input!

Mark

(I tried to keep my code copy pastable)
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposing a flattening functionality for deeply nested lists in NumPy

2024-12-30 Thread matti picus via NumPy-Discussion
On Tue, 31 Dec 2024 at 06:41, Mark via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Hello all,
>
>
> Many people have asked how to flatten a nested
>

I think this is out of scope for NumPy. Our bar for adding functionality is
quite high. We are unlikely to consider generic routines that are not
directly connected to NumPy style homogenous ND arrays, and even then will
look first for consensus with the Data Array API standard. In truth, our
direction is to deprecate and remove parts of the library that are not
directly related to the Array API.

Matti

>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposing a flattening functionality for deeply nested lists in NumPy

2024-12-30 Thread Dom Grigonis via NumPy-Discussion
Hi Mark,

I think this has already been implemented.

For Iterables (list included), there is `more_itertools.collapse`, which you 
can use to achieve examples in your code as 
`list(more_itertools.collapse(list))`. See: 
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.collapse

If you don’t mind intermediate conversions:
a) For well behaved arrays, such as `numpy.ndarray`, you can do 
`np.ravel(list).tolist()` or there is a bit more functional `torch.flatten` (I 
think `numpy` could add this functionality too - it is quite useful).
b) For ragged arrays, there is `awkward.flatten`.

Given that your need is not related to `numpy.ndarray` objects, I don’t think 
`numpy` is the right place for it. I would alternatively suggest exploring 
`more_itertools` and if something is missing, issue `Iterator` related 
proposals over there.


> On 31 Dec 2024, at 06:37, Mark via NumPy-Discussion 
>  wrote:
> 
> Hello all,
> 
> Many people have asked how to flatten a nested list into a one-dimensional 
> list (e.g., see this StackOverflow thread 
> ).
>  While flattening a 2D list is relatively straightforward, deeply nested 
> lists can become cumbersome to handle. To address this challenge, I propose 
> adding a built-in list-flattening functionality to NumPy.
>  
> By adding this feature to NumPy, the library would not only simplify a 
> frequently used task but also enhance its overall usability, making it an 
> even more powerful tool for data manipulation and scientific computing.
>  
> The code snippet below demonstrates how a nested list can be flattened, 
> enabling conversion into a NumPy array. I believe this would be a valuable 
> addition to the package. See also this issue 
> . 
> 
> from collections.abc import Iterable
>  
> def flatten_list(iterable):
>
> """
> Flatten a (nested) list into a one-dimensional list.
>
> Parameters
> --
> iterable : iterable
> The input collection.
>
> Returns
> ---
> flattened_list : list
> A one-dimensional list containing all the elements from the input,
> with any nested structures flattened.
>
> Examples
> 
>
> Flattening a list containing nested lists:
>
> >>> obj = [[1, 2, 3], [1, 2, 3]]
> >>> flatten_list(obj)
> [1, 2, 3, 1, 2, 3]
>
> Flattening a list with sublists of different lengths:
>
> >>> obj = [1, [7, 4], [8, 1, 5]]
> >>> flatten_list(obj)
> [1, 7, 4, 8, 1, 5]
>
> Flattening a deeply nested list.
>
> >>> obj = [1, [2], [[3]], [[[4]]],]
> >>> flatten_list(obj)
> [1, 2, 3, 4]
>
> Flattening a list with various types of elements:
>
> >>> obj = [1, [2], (3), (4,), {5}, np.array([1,2,3]), range(3), 'Hello']
> >>> flatten_list(obj)
> [1, 2, 3, 4, 5, 1, 2, 3, 0, 1, 2, 'Hello']
>
> """
>
> if not isinstance(iterable, Iterable) or isinstance(iterable, str):
> return [iterable]
>
> def flatten_generator(iterable):
>
> for item in iterable:
>
> if isinstance(item, Iterable) and not isinstance(item, str):
> yield from flatten_generator(item)
>
> else:
> yield item
>
> return list(flatten_generator(iterable))
> 
> 
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: surprising behavior from array indexing

2024-12-30 Thread Mark Harfouche via NumPy-Discussion
On Mon, Dec 30, 2024 at 1:51 PM Robert Kern  wrote:

> On Mon, Dec 30, 2024 at 10:28 AM Mark Harfouche via NumPy-Discussion <
> numpy-discussion@python.org> wrote:
>
>> Happy new year everybody!
>>
>> I've been upgrading my code to start to support array indexing and in my
>> tests I found something that was well documented, but surprising to me.
>>
>> I've tried to read through
>> https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing
>> and even after multiple passes, I still find it very terse...
>>
>> Consider a mutli dimensional dataset:
>>
>> import numpy as np
>> shape = (10, 20, 30)
>> original = np.arange(np.prod(shape)).reshape(shape)
>>
>> Let's consider we want to collapse dim 0 to a single entry
>> Let's consider we want a subset from dim 1, with a slice
>> Let's consider that we want want 3 elements from dim 2
>>
>> i = 2
>> j = slice(1, 6)
>> k = slice(7, 10)
>> out_basic = original[i, j, k]
>> assert out_basic.shape == (5, 3)
>>
>> Now consider we want to provide freedom to have instead of a slice for k,
>> an arbitrary "array"
>>
>> k = [7, 11, 13]
>> out_array = original[i, j, k]
>> assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}"
>>
>> AssertionError: shape is actually (3, 5)
>>
>
>
>> This is somewhat very surprising to me. I totally understand that things
>> won't change in terms of this kind of indexing in numpy, but is there a way
>> I can adjust my indexing strategy to regain the ability to slice into my
>> array in a "single shot".
>>
>> The main usecase is for arrays that are truly huge, but chucked in ways
>> where slicing into them can be quite efficient. This multi-dimensional
>> imaging data. Each chunk is quite "huge" so this kind of metadata
>> manipulation is worthwhile to avoid unecessary IO.
>>
>> Perhaps there is a "simple" distinction I am missing, for example using a
>> tuple for k instead of a list
>>
>
> No, there's no simple solution that you're missing. The kind of indexing
> that you want has been considered in NEP 21 (which called it "orthogonal
> indexing"), which saw some progress, but has largely been left fallow.
> There's been some movement on the Array API specification side, so that
> might spur some movement.
>
> https://numpy.org/neps/nep-0021-advanced-indexing.html
>
> Jaime Rio made a pure Python implementation that might work for you
> (though I'm not sure about the performance for large arrays with big
> slices), but it's buried in a close PR (still works, though):
>
> https://github.com/numpy/numpy/pull/5749/files
>
> >>> original_oindex = OrthogonalIndexer(original)
> >>> original_oindex[i, j, k]
> array([[1237, 1241, 1243],
>[1267, 1271, 1273],
>[1297, 1301, 1303],
>[1327, 1331, 1333],
>[1357, 1361, 1363]])
>
> Note that some other array implementations like Xarray and Zarr provide
> the `.oindex` property which does the orthogonal indexing semantics
> (roughly) per NEP 21, so those might be options for you.
>
> --
> Robert Kern
>

Thank you Robert,

I will have a read through NEP21. It seems like it discusses many aspects
that I ran into.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: surprising behavior from array indexing

2024-12-30 Thread Robert Kern via NumPy-Discussion
On Mon, Dec 30, 2024 at 10:28 AM Mark Harfouche via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Happy new year everybody!
>
> I've been upgrading my code to start to support array indexing and in my
> tests I found something that was well documented, but surprising to me.
>
> I've tried to read through
> https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing
> and even after multiple passes, I still find it very terse...
>
> Consider a mutli dimensional dataset:
>
> import numpy as np
> shape = (10, 20, 30)
> original = np.arange(np.prod(shape)).reshape(shape)
>
> Let's consider we want to collapse dim 0 to a single entry
> Let's consider we want a subset from dim 1, with a slice
> Let's consider that we want want 3 elements from dim 2
>
> i = 2
> j = slice(1, 6)
> k = slice(7, 10)
> out_basic = original[i, j, k]
> assert out_basic.shape == (5, 3)
>
> Now consider we want to provide freedom to have instead of a slice for k,
> an arbitrary "array"
>
> k = [7, 11, 13]
> out_array = original[i, j, k]
> assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}"
>
> AssertionError: shape is actually (3, 5)
>


> This is somewhat very surprising to me. I totally understand that things
> won't change in terms of this kind of indexing in numpy, but is there a way
> I can adjust my indexing strategy to regain the ability to slice into my
> array in a "single shot".
>
> The main usecase is for arrays that are truly huge, but chucked in ways
> where slicing into them can be quite efficient. This multi-dimensional
> imaging data. Each chunk is quite "huge" so this kind of metadata
> manipulation is worthwhile to avoid unecessary IO.
>
> Perhaps there is a "simple" distinction I am missing, for example using a
> tuple for k instead of a list
>

No, there's no simple solution that you're missing. The kind of indexing
that you want has been considered in NEP 21 (which called it "orthogonal
indexing"), which saw some progress, but has largely been left fallow.
There's been some movement on the Array API specification side, so that
might spur some movement.

https://numpy.org/neps/nep-0021-advanced-indexing.html

Jaime Rio made a pure Python implementation that might work for you (though
I'm not sure about the performance for large arrays with big slices), but
it's buried in a close PR (still works, though):

https://github.com/numpy/numpy/pull/5749/files

>>> original_oindex = OrthogonalIndexer(original)
>>> original_oindex[i, j, k]
array([[1237, 1241, 1243],
   [1267, 1271, 1273],
   [1297, 1301, 1303],
   [1327, 1331, 1333],
   [1357, 1361, 1363]])

Note that some other array implementations like Xarray and Zarr provide the
`.oindex` property which does the orthogonal indexing semantics (roughly)
per NEP 21, so those might be options for you.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com