[Numpy-discussion] Suggestion to show the shape in repr for summarized arrays

2024-09-30 Thread Marten van Kerkwijk
Hi All,

When the repr of an array is shown, currently the dtype and shape are
explicitly listed if these cannot be directly inferred from the list
that is shown, i.e., if the dtype is not float64 or int64, and if the
size of the array is zero, but the shape not the simple (0,).

For instance,

```
np.empty((10,2,0), dtype="i2")
array([], shape=(10, 2, 0), dtype=int16)
```

I propose to also show the shape for the (rare) case that an array is
summarized, i.e., when it has more than the default threshold of 1000
elements, and elements are replaced by `...`.  The logic is that also in
that case it is no longer clear what the shape actually is, which is
useful information (e.g., if working in a notebook -- which is the
original use case at https://github.com/numpy/numpy/issues/27461).

I have a PR for that at https://github.com/numpy/numpy/pull/27482
which would lead to the following:

```
np.arange(1001)
array([   0,1,2, ...,  998,  999, 1000], shape=(1001,))
```

Just to be sure: this PR causes *no* change for any arrays with sizes
less than a 1000, so I do not believe this change will lead to a lot of
unnecessary churn for down-stream packages.  Indeed, between numpy and
astropy (which has lots of doctests), the only changes to (doc)tests
that were needed are the very few for arrays where the "threshold" is
explicitly exceeded.

One irritant is that the shape is not an argument that can be passed in
to an `np.array` call.  While this is just as much the case for
zero-sized arrays, perhaps a better solution would be to move the shape
information out of the parentheses, e.g., using ``...)  # shape=(...)``.
I can change the PR to do that if that's the consensus.

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Suggestion to show the shape in repr for summarized arrays

2024-09-30 Thread Chris Barker via NumPy-Discussion
I like this.

while ideally, eval(repr(an_object)) == object, in practice this is
already violated fro large arays -- so other than doctests, this shouldn't
cause too many headaches.

-CHB


On Mon, Sep 30, 2024 at 10:13 AM Marten van Kerkwijk 
wrote:

> Hi All,
>
> When the repr of an array is shown, currently the dtype and shape are
> explicitly listed if these cannot be directly inferred from the list
> that is shown, i.e., if the dtype is not float64 or int64, and if the
> size of the array is zero, but the shape not the simple (0,).
>
> For instance,
>
> ```
> np.empty((10,2,0), dtype="i2")
> array([], shape=(10, 2, 0), dtype=int16)
> ```
>
> I propose to also show the shape for the (rare) case that an array is
> summarized, i.e., when it has more than the default threshold of 1000
> elements, and elements are replaced by `...`.  The logic is that also in
> that case it is no longer clear what the shape actually is, which is
> useful information (e.g., if working in a notebook -- which is the
> original use case at https://github.com/numpy/numpy/issues/27461).
>
> I have a PR for that at https://github.com/numpy/numpy/pull/27482
> which would lead to the following:
>
> ```
> np.arange(1001)
> array([   0,1,2, ...,  998,  999, 1000], shape=(1001,))
> ```
>
> Just to be sure: this PR causes *no* change for any arrays with sizes
> less than a 1000, so I do not believe this change will lead to a lot of
> unnecessary churn for down-stream packages.  Indeed, between numpy and
> astropy (which has lots of doctests), the only changes to (doc)tests
> that were needed are the very few for arrays where the "threshold" is
> explicitly exceeded.
>
> One irritant is that the shape is not an argument that can be passed in
> to an `np.array` call.  While this is just as much the case for
> zero-sized arrays, perhaps a better solution would be to move the shape
> information out of the parentheses, e.g., using ``...)  # shape=(...)``.
> I can change the PR to do that if that's the consensus.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: chris.bar...@noaa.gov
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com