[Numpy-discussion] Re: Change in numpy.percentile

2023-10-11 Thread Peter Cock via NumPy-Discussion
On Tue, Oct 10, 2023 at 6:32 PM Matthew Brett 
wrote:

> Hi,
>
>
> On Tue, 10 Oct 2023 at 00:55, Andrew Nelson  wrote:
> >
> >
> > On Mon, 9 Oct 2023 at 23:50, Matthew Brett 
> wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Oct 9, 2023 at 11:49 AM Andrew Nelson 
> wrote:
> >> Could you say more about why you consider:
> >> np.mean(x, dropna=True)
> >> to be less clear in intent than:
> >> np.nanmean(x)
> >> ?  Is it just that someone could accidentally forget that the default
> >
> >
> > The discussion isn't a deal breaker for me, I just wanted to put out a
> different POV.
> > The name of the function encodes what it does. By putting them both in
> the function name it's clear what the function does.
> >
> > ...
> >
> > Imagine that one has a large codebase and you have to find all the
> locations where nans could affect a mean. There may be lots of prod, sum,
> etc, also distributed within the codebase. You wouldn't want to search for
> `dropna` because you get every function that handles a nan. If you search
> for nanmean you only get the locations you want.
>
> So, is this the more or less the difference between:
>
> grep 'np\.nanmean' *.py
>
> and
>
> grep 'np\.mean(.*,\s*dropna\s*=\s*True' *.py
>
> ?
>
> Cheers,
>
> Matthew
>
>
Keep in mind that the dropna argument might very well be on a different
line (especially with black formatting), so searches could be much harder
than looking for the nanmean function.

(I do not deal with enough NaN data to have a strong view either way here)

Peter
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Change in numpy.percentile

2023-10-11 Thread Matthew Brett
Hi,

On Wed, Oct 11, 2023 at 9:17 AM Peter Cock via NumPy-Discussion
 wrote:
>
>
>
> On Tue, Oct 10, 2023 at 6:32 PM Matthew Brett  wrote:
>>
>> Hi,
>>
>>
>> On Tue, 10 Oct 2023 at 00:55, Andrew Nelson  wrote:
>> >
>> >
>> > On Mon, 9 Oct 2023 at 23:50, Matthew Brett  wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Mon, Oct 9, 2023 at 11:49 AM Andrew Nelson  wrote:
>> >> Could you say more about why you consider:
>> >> np.mean(x, dropna=True)
>> >> to be less clear in intent than:
>> >> np.nanmean(x)
>> >> ?  Is it just that someone could accidentally forget that the default
>> >
>> >
>> > The discussion isn't a deal breaker for me, I just wanted to put out a 
>> > different POV.
>> > The name of the function encodes what it does. By putting them both in the 
>> > function name it's clear what the function does.
>> >
>> > ...
>> >
>> > Imagine that one has a large codebase and you have to find all the 
>> > locations where nans could affect a mean. There may be lots of prod, sum, 
>> > etc, also distributed within the codebase. You wouldn't want to search for 
>> > `dropna` because you get every function that handles a nan. If you search 
>> > for nanmean you only get the locations you want.
>>
>> So, is this the more or less the difference between:
>>
>> grep 'np\.nanmean' *.py
>>
>> and
>>
>> grep 'np\.mean(.*,\s*dropna\s*=\s*True' *.py
>>
>> ?
>>
>> Cheers,
>>
>> Matthew
>>
>
> Keep in mind that the dropna argument might very well be on a different
> line (especially with black formatting), so searches could be much harder
> than looking for the nanmean function.
>
> (I do not deal with enough NaN data to have a strong view either way here)

Sorry - yes - I had kept that in mind - it's just the regexp would
have been a bit more complex, and I was trying to get at the point
that:

a) We won't find ourselves wanting do such a search very often (I
can't remember eve needing it), and
b) When we do have to do such a search, it's well within the wit of
man to do it.

Cheers,

Matthew
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-11 Thread Mateusz Sokol
Hi! Thank you for all your feedback this week!

We have made a decision to take a less disruptive option that we considered
and that came up in this discussion.

We back out of the `NumpyUnpickler` class solution for reading pickles
across major NumPy versions.

Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading
NumPy 1.x pickles.
Additionally, `numpy._core` stubs will be backported to 1.26 to ensure
compatibility the other way around - loading NumPy 2.0 pickles with NumPy
1.26 installed.

Both major versions will continue to create pickles with their own contents
(NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with `numpy._core` paths).

This way any pickle will be loadable by both major versions.


On Tue, Oct 10, 2023 at 3:33 PM Nathan  wrote:

>
>
> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg <
> r.a.j.van.elb...@hetnet.nl> wrote:
>
>> I have one more useCase to consider from our ecosystem.
>>
>> We dump numpy arrays into a MongoDB using GridFS for subsequent
>> visualization, some snippets:
>>
>> '''Python
>> with BytesIO() as BIO:
>> np.save(BIO, numpy_array)
>> serialized_A = BIO.getvalue()
>> filehandle_id = self.representations_files.put(serialized_A)
>> '''
>>
>> and then restore them in the other application:
>>
>> '''Python
>> numpy_array = np.load(BytesIO(serializedA))
>> '''
>> For us this is for development work only and I am less concerned about
>> having mixed versions in my database, but in principle that is a scenario.
>> But it seems to me that for this to work the reading application needs to
>> be migrated to version 2 and temporarily extended with the NumpyUnpickler
>> before the writing application is migrated. Or they need to be migrated at
>> the same time. Is that correct?
>
>
> np.save and np.load will use NumpyUnpickler under the hood so you won’t
> have any issues, you would only have issues if you saved or loaded pickles
> using the pickle module directly.
>
>
>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: nathan12...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: mso...@quansight.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-11 Thread Nathan
On Wed, Oct 11, 2023 at 4:24 PM Mateusz Sokol  wrote:

> Hi! Thank you for all your feedback this week!
>
> We have made a decision to take a less disruptive option that we
> considered and that came up in this discussion.
>
> We back out of the `NumpyUnpickler` class solution for reading pickles
> across major NumPy versions.
>
> Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading
> NumPy 1.x pickles.
> Additionally, `numpy._core` stubs will be backported to 1.26 to ensure
> compatibility the other way around - loading NumPy 2.0 pickles with NumPy
> 1.26 installed.
>
> Both major versions will continue to create pickles with their own
> contents (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with
> `numpy._core` paths).
>
> This way any pickle will be loadable by both major versions.
>

Thanks for the summary Mateusz!

I want to add that there will still be module-level `__getattr__`
implementations that will raise deprecation warnings on any attribute
access in `np.core`, `numpy.core.multiarray` or
`numpy.core._multiarray_umath`, but direct imports will not generate any
warnings. Since pickles directly import types that appear in pickle files,
loading a pickle that refers to types or functions in these modules won’t
generate any warnings.

Searching on github indicates that direct imports like this are relatively
rare in user code, which tend to either just import the top-level numpy
module and use attribute access or use `from` imports, which both invoke
the module-level `__getattr__`. Hopefully we’ll get most of the benefit of
alerting users that they are using private internals without needing to
break old pickles.


>
>
> On Tue, Oct 10, 2023 at 3:33 PM Nathan  wrote:
>
>>
>>
>> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg <
>> r.a.j.van.elb...@hetnet.nl> wrote:
>>
>>> I have one more useCase to consider from our ecosystem.
>>>
>>> We dump numpy arrays into a MongoDB using GridFS for subsequent
>>> visualization, some snippets:
>>>
>>> '''Python
>>> with BytesIO() as BIO:
>>> np.save(BIO, numpy_array)
>>> serialized_A = BIO.getvalue()
>>> filehandle_id = self.representations_files.put(serialized_A)
>>> '''
>>>
>>> and then restore them in the other application:
>>>
>>> '''Python
>>> numpy_array = np.load(BytesIO(serializedA))
>>> '''
>>> For us this is for development work only and I am less concerned about
>>> having mixed versions in my database, but in principle that is a scenario.
>>> But it seems to me that for this to work the reading application needs to
>>> be migrated to version 2 and temporarily extended with the NumpyUnpickler
>>> before the writing application is migrated. Or they need to be migrated at
>>> the same time. Is that correct?
>>
>>
>> np.save and np.load will use NumpyUnpickler under the hood so you won’t
>> have any issues, you would only have issues if you saved or loaded pickles
>> using the pickle module directly.
>>
>>
>>
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: nathan12...@gmail.com
>>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>
> Member address: mso...@quansight.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-11 Thread Aaron Meurer
Is there a way to make pickle not depend on the specific submodule
that a class is defined in? Wouldn't this happen again if you ever
decided to rename _core.

The underscores in numpy._core._reconstruct don't actually do anything
here in terms of making the interface not public, and if anything, are
really misleading.

I'm also curious about this more generally. We tend to think of the
fully qualified name of a class as being an implementation detail for
many libraries and only the top-level lib.Name should be used, but
many things in the language (including this) break this.

Aaron Meurer

On Wed, Oct 11, 2023 at 4:49 PM Nathan  wrote:
>
>
>
> On Wed, Oct 11, 2023 at 4:24 PM Mateusz Sokol  wrote:
>>
>> Hi! Thank you for all your feedback this week!
>>
>> We have made a decision to take a less disruptive option that we considered 
>> and that came up in this discussion.
>>
>> We back out of the `NumpyUnpickler` class solution for reading pickles 
>> across major NumPy versions.
>>
>> Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading 
>> NumPy 1.x pickles.
>> Additionally, `numpy._core` stubs will be backported to 1.26 to ensure 
>> compatibility the other way around - loading NumPy 2.0 pickles with NumPy 
>> 1.26 installed.
>>
>> Both major versions will continue to create pickles with their own contents 
>> (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with `numpy._core` paths).
>>
>> This way any pickle will be loadable by both major versions.
>
>
> Thanks for the summary Mateusz!
>
> I want to add that there will still be module-level `__getattr__` 
> implementations that will raise deprecation warnings on any attribute access 
> in `np.core`, `numpy.core.multiarray` or `numpy.core._multiarray_umath`, but 
> direct imports will not generate any warnings. Since pickles directly import 
> types that appear in pickle files, loading a pickle that refers to types or 
> functions in these modules won’t generate any warnings.
>
> Searching on github indicates that direct imports like this are relatively 
> rare in user code, which tend to either just import the top-level numpy 
> module and use attribute access or use `from` imports, which both invoke the 
> module-level `__getattr__`. Hopefully we’ll get most of the benefit of 
> alerting users that they are using private internals without needing to break 
> old pickles.
>
>>
>>
>>
>> On Tue, Oct 10, 2023 at 3:33 PM Nathan  wrote:
>>>
>>>
>>>
>>> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg 
>>>  wrote:

 I have one more useCase to consider from our ecosystem.

 We dump numpy arrays into a MongoDB using GridFS for subsequent 
 visualization, some snippets:

 '''Python
 with BytesIO() as BIO:
 np.save(BIO, numpy_array)
 serialized_A = BIO.getvalue()
 filehandle_id = self.representations_files.put(serialized_A)
 '''

 and then restore them in the other application:

 '''Python
 numpy_array = np.load(BytesIO(serializedA))
 '''
 For us this is for development work only and I am less concerned about 
 having mixed versions in my database, but in principle that is a scenario. 
 But it seems to me that for this to work the reading application needs to 
 be migrated to version 2 and temporarily extended with the NumpyUnpickler 
 before the writing application is migrated. Or they need to be migrated at 
 the same time. Is that correct?
>>>
>>>
>>> np.save and np.load will use NumpyUnpickler under the hood so you won’t 
>>> have any issues, you would only have issues if you saved or loaded pickles 
>>> using the pickle module directly.
>>>
>>>

 ___
 NumPy-Discussion mailing list -- numpy-discussion@python.org
 To unsubscribe send an email to numpy-discussion-le...@python.org
 https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
 Member address: nathan12...@gmail.com
>>>
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>>
>>> Member address: mso...@quansight.com
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: nathan12...@gmail.com
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: asmeu...@gmail.com
___
NumPy-Discussion mailing l

[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0

2023-10-11 Thread Robert Kern
On Wed, Oct 11, 2023 at 11:50 PM Aaron Meurer  wrote:

> Is there a way to make pickle not depend on the specific submodule
> that a class is defined in?


No. `Unpickler` somehow has to locate the class/reconstruction function. It
will have to use the name given by the `__reduce_ex__` during pickling.


> Wouldn't this happen again if you ever
> decided to rename _core.
>

Yes.


> The underscores in numpy._core._reconstruct don't actually do anything
> here in terms of making the interface not public, and if anything, are
> really misleading.
>

There's private and there's private. There are two broad things that could
mean:

1. We don't want users playing around with it, importing it directly in
their code and using it.
2. Marking that we won't mess around with it without going through the
deprecation policy.

Usually, these two go together (hiding it from users directly calling it
means that we get to mess around with it freely), but when we get to
metaprogramming tasks like pickling, they become a little decoupled.
Because the "user" that we have to think about isn't a person reading
documentation and browsing APIs, but a file that someone created years ago
and some infrastructure that interprets that file. I think it's good that
`_reconstruct` is hidden from human users and is distinct from almost all
of what constitutes "numpy's public API", but that doesn't mean that we
can't have a tiny third class of functions where we do preserve some
guarantees. It's only occasionally confusing to us core devs, not users
(who can rely on the "don't touch underscored names" rule just fine).

I'm also curious about this more generally. We tend to think of the
> fully qualified name of a class as being an implementation detail for
> many libraries and only the top-level lib.Name should be used, but
> many things in the language (including this) break this.
>

Yes, and it's why these things attempt to be scoped in ways that limit this
problem. I.e. if you use pickling, you're told to use it only for transient
data with the same versions of libraries on both ends of the pipe, but the
reality is that it's too useful to avoid in creating files with arbitrarily
long lives. Not their fault; they warned us!

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com