[Numpy-discussion] Re: Change in numpy.percentile
On Tue, Oct 10, 2023 at 6:32 PM Matthew Brett wrote: > Hi, > > > On Tue, 10 Oct 2023 at 00:55, Andrew Nelson wrote: > > > > > > On Mon, 9 Oct 2023 at 23:50, Matthew Brett > wrote: > >> > >> Hi, > >> > >> On Mon, Oct 9, 2023 at 11:49 AM Andrew Nelson > wrote: > >> Could you say more about why you consider: > >> np.mean(x, dropna=True) > >> to be less clear in intent than: > >> np.nanmean(x) > >> ? Is it just that someone could accidentally forget that the default > > > > > > The discussion isn't a deal breaker for me, I just wanted to put out a > different POV. > > The name of the function encodes what it does. By putting them both in > the function name it's clear what the function does. > > > > ... > > > > Imagine that one has a large codebase and you have to find all the > locations where nans could affect a mean. There may be lots of prod, sum, > etc, also distributed within the codebase. You wouldn't want to search for > `dropna` because you get every function that handles a nan. If you search > for nanmean you only get the locations you want. > > So, is this the more or less the difference between: > > grep 'np\.nanmean' *.py > > and > > grep 'np\.mean(.*,\s*dropna\s*=\s*True' *.py > > ? > > Cheers, > > Matthew > > Keep in mind that the dropna argument might very well be on a different line (especially with black formatting), so searches could be much harder than looking for the nanmean function. (I do not deal with enough NaN data to have a strong view either way here) Peter ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Change in numpy.percentile
Hi, On Wed, Oct 11, 2023 at 9:17 AM Peter Cock via NumPy-Discussion wrote: > > > > On Tue, Oct 10, 2023 at 6:32 PM Matthew Brett wrote: >> >> Hi, >> >> >> On Tue, 10 Oct 2023 at 00:55, Andrew Nelson wrote: >> > >> > >> > On Mon, 9 Oct 2023 at 23:50, Matthew Brett wrote: >> >> >> >> Hi, >> >> >> >> On Mon, Oct 9, 2023 at 11:49 AM Andrew Nelson wrote: >> >> Could you say more about why you consider: >> >> np.mean(x, dropna=True) >> >> to be less clear in intent than: >> >> np.nanmean(x) >> >> ? Is it just that someone could accidentally forget that the default >> > >> > >> > The discussion isn't a deal breaker for me, I just wanted to put out a >> > different POV. >> > The name of the function encodes what it does. By putting them both in the >> > function name it's clear what the function does. >> > >> > ... >> > >> > Imagine that one has a large codebase and you have to find all the >> > locations where nans could affect a mean. There may be lots of prod, sum, >> > etc, also distributed within the codebase. You wouldn't want to search for >> > `dropna` because you get every function that handles a nan. If you search >> > for nanmean you only get the locations you want. >> >> So, is this the more or less the difference between: >> >> grep 'np\.nanmean' *.py >> >> and >> >> grep 'np\.mean(.*,\s*dropna\s*=\s*True' *.py >> >> ? >> >> Cheers, >> >> Matthew >> > > Keep in mind that the dropna argument might very well be on a different > line (especially with black formatting), so searches could be much harder > than looking for the nanmean function. > > (I do not deal with enough NaN data to have a strong view either way here) Sorry - yes - I had kept that in mind - it's just the regexp would have been a bit more complex, and I was trying to get at the point that: a) We won't find ourselves wanting do such a search very often (I can't remember eve needing it), and b) When we do have to do such a search, it's well within the wit of man to do it. Cheers, Matthew ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0
Hi! Thank you for all your feedback this week! We have made a decision to take a less disruptive option that we considered and that came up in this discussion. We back out of the `NumpyUnpickler` class solution for reading pickles across major NumPy versions. Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading NumPy 1.x pickles. Additionally, `numpy._core` stubs will be backported to 1.26 to ensure compatibility the other way around - loading NumPy 2.0 pickles with NumPy 1.26 installed. Both major versions will continue to create pickles with their own contents (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with `numpy._core` paths). This way any pickle will be loadable by both major versions. On Tue, Oct 10, 2023 at 3:33 PM Nathan wrote: > > > On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg < > r.a.j.van.elb...@hetnet.nl> wrote: > >> I have one more useCase to consider from our ecosystem. >> >> We dump numpy arrays into a MongoDB using GridFS for subsequent >> visualization, some snippets: >> >> '''Python >> with BytesIO() as BIO: >> np.save(BIO, numpy_array) >> serialized_A = BIO.getvalue() >> filehandle_id = self.representations_files.put(serialized_A) >> ''' >> >> and then restore them in the other application: >> >> '''Python >> numpy_array = np.load(BytesIO(serializedA)) >> ''' >> For us this is for development work only and I am less concerned about >> having mixed versions in my database, but in principle that is a scenario. >> But it seems to me that for this to work the reading application needs to >> be migrated to version 2 and temporarily extended with the NumpyUnpickler >> before the writing application is migrated. Or they need to be migrated at >> the same time. Is that correct? > > > np.save and np.load will use NumpyUnpickler under the hood so you won’t > have any issues, you would only have issues if you saved or loaded pickles > using the pickle module directly. > > > >> ___ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> To unsubscribe send an email to numpy-discussion-le...@python.org >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> Member address: nathan12...@gmail.com >> > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: mso...@quansight.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0
On Wed, Oct 11, 2023 at 4:24 PM Mateusz Sokol wrote: > Hi! Thank you for all your feedback this week! > > We have made a decision to take a less disruptive option that we > considered and that came up in this discussion. > > We back out of the `NumpyUnpickler` class solution for reading pickles > across major NumPy versions. > > Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading > NumPy 1.x pickles. > Additionally, `numpy._core` stubs will be backported to 1.26 to ensure > compatibility the other way around - loading NumPy 2.0 pickles with NumPy > 1.26 installed. > > Both major versions will continue to create pickles with their own > contents (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with > `numpy._core` paths). > > This way any pickle will be loadable by both major versions. > Thanks for the summary Mateusz! I want to add that there will still be module-level `__getattr__` implementations that will raise deprecation warnings on any attribute access in `np.core`, `numpy.core.multiarray` or `numpy.core._multiarray_umath`, but direct imports will not generate any warnings. Since pickles directly import types that appear in pickle files, loading a pickle that refers to types or functions in these modules won’t generate any warnings. Searching on github indicates that direct imports like this are relatively rare in user code, which tend to either just import the top-level numpy module and use attribute access or use `from` imports, which both invoke the module-level `__getattr__`. Hopefully we’ll get most of the benefit of alerting users that they are using private internals without needing to break old pickles. > > > On Tue, Oct 10, 2023 at 3:33 PM Nathan wrote: > >> >> >> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg < >> r.a.j.van.elb...@hetnet.nl> wrote: >> >>> I have one more useCase to consider from our ecosystem. >>> >>> We dump numpy arrays into a MongoDB using GridFS for subsequent >>> visualization, some snippets: >>> >>> '''Python >>> with BytesIO() as BIO: >>> np.save(BIO, numpy_array) >>> serialized_A = BIO.getvalue() >>> filehandle_id = self.representations_files.put(serialized_A) >>> ''' >>> >>> and then restore them in the other application: >>> >>> '''Python >>> numpy_array = np.load(BytesIO(serializedA)) >>> ''' >>> For us this is for development work only and I am less concerned about >>> having mixed versions in my database, but in principle that is a scenario. >>> But it seems to me that for this to work the reading application needs to >>> be migrated to version 2 and temporarily extended with the NumpyUnpickler >>> before the writing application is migrated. Or they need to be migrated at >>> the same time. Is that correct? >> >> >> np.save and np.load will use NumpyUnpickler under the hood so you won’t >> have any issues, you would only have issues if you saved or loaded pickles >> using the pickle module directly. >> >> >> >>> ___ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> Member address: nathan12...@gmail.com >>> >> ___ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> To unsubscribe send an email to numpy-discussion-le...@python.org >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> > Member address: mso...@quansight.com >> > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: nathan12...@gmail.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0
Is there a way to make pickle not depend on the specific submodule that a class is defined in? Wouldn't this happen again if you ever decided to rename _core. The underscores in numpy._core._reconstruct don't actually do anything here in terms of making the interface not public, and if anything, are really misleading. I'm also curious about this more generally. We tend to think of the fully qualified name of a class as being an implementation detail for many libraries and only the top-level lib.Name should be used, but many things in the language (including this) break this. Aaron Meurer On Wed, Oct 11, 2023 at 4:49 PM Nathan wrote: > > > > On Wed, Oct 11, 2023 at 4:24 PM Mateusz Sokol wrote: >> >> Hi! Thank you for all your feedback this week! >> >> We have made a decision to take a less disruptive option that we considered >> and that came up in this discussion. >> >> We back out of the `NumpyUnpickler` class solution for reading pickles >> across major NumPy versions. >> >> Instead, we will retain `numpy.core` stubs in NumPy 2.0 to allow loading >> NumPy 1.x pickles. >> Additionally, `numpy._core` stubs will be backported to 1.26 to ensure >> compatibility the other way around - loading NumPy 2.0 pickles with NumPy >> 1.26 installed. >> >> Both major versions will continue to create pickles with their own contents >> (NumPy 1.26 with `numpy.core` paths and NumPy 2.0 with `numpy._core` paths). >> >> This way any pickle will be loadable by both major versions. > > > Thanks for the summary Mateusz! > > I want to add that there will still be module-level `__getattr__` > implementations that will raise deprecation warnings on any attribute access > in `np.core`, `numpy.core.multiarray` or `numpy.core._multiarray_umath`, but > direct imports will not generate any warnings. Since pickles directly import > types that appear in pickle files, loading a pickle that refers to types or > functions in these modules won’t generate any warnings. > > Searching on github indicates that direct imports like this are relatively > rare in user code, which tend to either just import the top-level numpy > module and use attribute access or use `from` imports, which both invoke the > module-level `__getattr__`. Hopefully we’ll get most of the benefit of > alerting users that they are using private internals without needing to break > old pickles. > >> >> >> >> On Tue, Oct 10, 2023 at 3:33 PM Nathan wrote: >>> >>> >>> >>> On Tue, Oct 10, 2023 at 7:03 AM Ronald van Elburg >>> wrote: I have one more useCase to consider from our ecosystem. We dump numpy arrays into a MongoDB using GridFS for subsequent visualization, some snippets: '''Python with BytesIO() as BIO: np.save(BIO, numpy_array) serialized_A = BIO.getvalue() filehandle_id = self.representations_files.put(serialized_A) ''' and then restore them in the other application: '''Python numpy_array = np.load(BytesIO(serializedA)) ''' For us this is for development work only and I am less concerned about having mixed versions in my database, but in principle that is a scenario. But it seems to me that for this to work the reading application needs to be migrated to version 2 and temporarily extended with the NumpyUnpickler before the writing application is migrated. Or they need to be migrated at the same time. Is that correct? >>> >>> >>> np.save and np.load will use NumpyUnpickler under the hood so you won’t >>> have any issues, you would only have issues if you saved or loaded pickles >>> using the pickle module directly. >>> >>> ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: nathan12...@gmail.com >>> >>> ___ >>> NumPy-Discussion mailing list -- numpy-discussion@python.org >>> To unsubscribe send an email to numpy-discussion-le...@python.org >>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >>> >>> Member address: mso...@quansight.com >> >> ___ >> NumPy-Discussion mailing list -- numpy-discussion@python.org >> To unsubscribe send an email to numpy-discussion-le...@python.org >> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ >> Member address: nathan12...@gmail.com > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: asmeu...@gmail.com ___ NumPy-Discussion mailing l
[Numpy-discussion] Re: Adding NumpyUnpickler to Numpy 1.26 and future Numpy 2.0
On Wed, Oct 11, 2023 at 11:50 PM Aaron Meurer wrote: > Is there a way to make pickle not depend on the specific submodule > that a class is defined in? No. `Unpickler` somehow has to locate the class/reconstruction function. It will have to use the name given by the `__reduce_ex__` during pickling. > Wouldn't this happen again if you ever > decided to rename _core. > Yes. > The underscores in numpy._core._reconstruct don't actually do anything > here in terms of making the interface not public, and if anything, are > really misleading. > There's private and there's private. There are two broad things that could mean: 1. We don't want users playing around with it, importing it directly in their code and using it. 2. Marking that we won't mess around with it without going through the deprecation policy. Usually, these two go together (hiding it from users directly calling it means that we get to mess around with it freely), but when we get to metaprogramming tasks like pickling, they become a little decoupled. Because the "user" that we have to think about isn't a person reading documentation and browsing APIs, but a file that someone created years ago and some infrastructure that interprets that file. I think it's good that `_reconstruct` is hidden from human users and is distinct from almost all of what constitutes "numpy's public API", but that doesn't mean that we can't have a tiny third class of functions where we do preserve some guarantees. It's only occasionally confusing to us core devs, not users (who can rely on the "don't touch underscored names" rule just fine). I'm also curious about this more generally. We tend to think of the > fully qualified name of a class as being an implementation detail for > many libraries and only the top-level lib.Name should be used, but > many things in the language (including this) break this. > Yes, and it's why these things attempt to be scoped in ways that limit this problem. I.e. if you use pickling, you're told to use it only for transient data with the same versions of libraries on both ends of the pipe, but the reality is that it's too useful to avoid in creating files with arbitrarily long lives. Not their fault; they warned us! -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com