[Numpy-discussion] Type annotation for Numpy arrays, accelerators and numpy.typing
Hi, When Numpy 1.20 was released, I discovered numpy.typing and its documentation https://numpy.org/doc/stable/reference/typing.html I know that it is very new but I'm a bit lost. A good API to describe Array type would be useful not only for type checkers but also for Python accelerators using ndarrays (in particular Pythran, Numba, Cython, Transonic). For Transonic, I'd like to be able to use internally numpy.typing to have a better implementation of what we need in transonic.typing (in particular compatible with type checkers like MyPy). However, it seems that I can't do anything with what I see today in numpy.typing. For Python-Numpy accelerators, we need to be able to define precise array types to limit the compilation time and give useful hints for optimizations (ndim, partial or full shape). We also need fused types. What can be done with Transonic is described in these pages: https://transonic.readthedocs.io/en/latest/examples/type_hints.html and https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html I think it would be good to be able to do things like that with numpy.typing. It may be already possible but I can't find how in the doc. I can give few examples here. First very simple: from transonic import Array Af3d = Array[float, "3d"] # Note that this can also be written without Array just as Af3d = "float[:,:,:]" # same thing but only contiguous C ordered Af3d = Array[float, "3d", "C"] Note: being able to limit the compilation just for C-aligned arrays is very important since it can drastically decrease the compilation time/memory and that some numerical kernels are anyway written to be efficient only with C (or Fortran) ordered arrays. # 2d color image A_im = Array[np.int16, "[:,:,3]"] Now, fused types. This example is taken from a real life case (https://foss.heptapod.net/fluiddyn/fluidsim/-/blob/branch/default/fluidsim/base/time_stepping/pseudo_spect.py) so it's really useful in practice. from transonic import Type, NDim, Array, Union N = NDim(2, 3, 4) A = Array[np.complex128, N, "C"] Am1 = Array[np.complex128, N - 1, "C"] N123 = NDim(1, 2, 3) A123c = Array[np.complex128, N123, "C"] A123f = Array[np.float64, N123, "C"] T = Type(np.float64, np.complex128) A1 = Array[T, N, "C"] A2 = Array[T, N - 1, "C"] ArrayDiss = Union[A1, A2] To summarize, type annotations are and will also be used for Python-Numpy accelerators. It would be good to also consider this application when designing numpy.typing. Cheers, Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface
Hello again, Am Mo., 15. Feb. 2021 um 16:57 Uhr schrieb Sebastian Berg : > > On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote: > > Last week I updated my example code to be more slim. There now > > exists > > a single-file extension module: > > https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp > > . > > The corresponding test program > > https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py > > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as > > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the > > ``print`` > > statement contained in the test file is commented out. > > I have tried it out, and can confirm that using debugging tools (namely > valgrind), will allow you track down the issue (valgrind reports it > from within python, running a python without debug symbols may > obfuscate the actual problem; if that is the limiting you, I can post > my valgrind output). > Since you are running a linux system, I am confident that you can run > it in valgrind to find it yourself. (There may be other ways.) > > Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and > ignore some errors e.g. when importing NumPy. >From running ``PYTHONMALLOC=malloc valgrind python3 2021-01-11_0909.py`` (with the preceding call of ``print`` in :file:`2021-01-11_0909.py` commented out) I found a few things: - The call might or might not succeed. It doesn't always lead to a segfault. - "at 0x4A64A73: ??? (in /usr/lib/libpython3.9.so.1.0), called by 0x4A64914: PyMemoryView_FromObject (in /usr/lib/libpython3.9.so.1.0)", a "Conditional jump or move depends on uninitialised value(s)". After one more block of valgrind output ("Use of uninitialised value of size 8 at 0x48EEA1B: ??? (in /usr/lib/libpython3.9.so.1.0)"), it finally leads either to "Invalid read of size 8 at 0x48EEA1B: ??? (in /usr/lib/libpython3.9.so.1.0) [...] Address 0x1 is not stack'd, malloc'd or (recently) free'd", resulting in a segfault, or just to another "Use of uninitialised value of size 8 at 0x48EEA15: ??? (in /usr/lib/libpython3.9.so.1.0)", after which the program completes successfully. - All this happens within "PyMemoryView_FromObject". So I can only guess that the "uninitialised value" is compared to 0x0, and when it is different (e.g. 0x1), it leads via "Address 0x1 is not stack'd, malloc'd or (recently) free'd" to the segfault observed. I suppose I need to compile Python and numpy myself to see the debug symbols instead of the "???" marks? Maybe even with ``-O0``? Furthermore, the shared object belonging to my code isn't involved directly in any way, so the segfault possibly has to do with some data I am leaving "uninitialised" at the moment. Thanks for the other replies as well; for the moment I feel that going the valgrind way might teach me how to debug errors of this kind myself. So far, Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface
I've reproduced the error you've described and got rid of it without valgrind. Those two lines are enough to avoid the segfault. But feel free to find it yourself :) Best regards, Lev On Tue, Feb 16, 2021 at 5:02 PM Friedrich Romstedt < friedrichromst...@gmail.com> wrote: > Hello again, > > Am Mo., 15. Feb. 2021 um 16:57 Uhr schrieb Sebastian Berg > : > > > > On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote: > > > Last week I updated my example code to be more slim. There now > > > exists > > > a single-file extension module: > > > > https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp > > > . > > > The corresponding test program > > > > https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py > > > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as > > > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the > > > ``print`` > > > statement contained in the test file is commented out. > > > > I have tried it out, and can confirm that using debugging tools (namely > > valgrind), will allow you track down the issue (valgrind reports it > > from within python, running a python without debug symbols may > > obfuscate the actual problem; if that is the limiting you, I can post > > my valgrind output). > > Since you are running a linux system, I am confident that you can run > > it in valgrind to find it yourself. (There may be other ways.) > > > > Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and > > ignore some errors e.g. when importing NumPy. > > From running ``PYTHONMALLOC=malloc valgrind python3 > 2021-01-11_0909.py`` (with the preceding call of ``print`` in > :file:`2021-01-11_0909.py` commented out) I found a few things: > > - The call might or might not succeed. It doesn't always lead to a > segfault. > - "at 0x4A64A73: ??? (in /usr/lib/libpython3.9.so.1.0), called by > 0x4A64914: PyMemoryView_FromObject (in /usr/lib/libpython3.9.so.1.0)", > a "Conditional jump or move depends on uninitialised value(s)". After > one more block of valgrind output ("Use of uninitialised value of size > 8 at 0x48EEA1B: ??? (in /usr/lib/libpython3.9.so.1.0)"), it finally > leads either to "Invalid read of size 8 at 0x48EEA1B: ??? (in > /usr/lib/libpython3.9.so.1.0) [...] Address 0x1 is not stack'd, > malloc'd or (recently) free'd", resulting in a segfault, or just to > another "Use of uninitialised value of size 8 at 0x48EEA15: ??? (in > /usr/lib/libpython3.9.so.1.0)", after which the program completes > successfully. > - All this happens within "PyMemoryView_FromObject". > > So I can only guess that the "uninitialised value" is compared to 0x0, > and when it is different (e.g. 0x1), it leads via "Address 0x1 is not > stack'd, malloc'd or (recently) free'd" to the segfault observed. > > I suppose I need to compile Python and numpy myself to see the debug > symbols instead of the "???" marks? Maybe even with ``-O0``? > > Furthermore, the shared object belonging to my code isn't involved > directly in any way, so the segfault possibly has to do with some data > I am leaving "uninitialised" at the moment. > > Thanks for the other replies as well; for the moment I feel that going > the valgrind way might teach me how to debug errors of this kind > myself. > > So far, > Friedrich > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Type annotation for Numpy arrays, accelerators and numpy.typing
On Tue, Feb 16, 2021 at 10:20 AM PIERRE AUGIER < pierre.aug...@univ-grenoble-alpes.fr> wrote: > Hi, > > When Numpy 1.20 was released, I discovered numpy.typing and its > documentation https://numpy.org/doc/stable/reference/typing.html > > I know that it is very new but I'm a bit lost. A good API to describe > Array type would be useful not only for type checkers but also for Python > accelerators using ndarrays (in particular Pythran, Numba, Cython, > Transonic). > > For Transonic, I'd like to be able to use internally numpy.typing to have > a better implementation of what we need in transonic.typing (in particular > compatible with type checkers like MyPy). > > However, it seems that I can't do anything with what I see today in > numpy.typing. > > For Python-Numpy accelerators, we need to be able to define precise array > types to limit the compilation time and give useful hints for optimizations > (ndim, partial or full shape). We also need fused types. > Hi Pierre, I think what you are getting at is that ArrayLike isn't useful for accelerators, right? ArrayLike is needed to add annotations to functions that use np.asarray to coerce their inputs, which may be scalars, lists, etc. That's indeed never what you want for an accelerator, and it'd be great if people stopped writing that kind of code - but we're stuck with a lot of it in SciPy and many other downstream libraries. For your purposes, I think you want one of two things: 1. functions that only take `ndarray`, or maybe at most `Union[float, ndarray]` 2. perhaps in the future, a well-defined array Protocol, to support multiple array types (this is hinted at in https://data-apis.github.io/array-api/latest/design_topics/static_typing.html ) You don't need numpy.typing for (1), you can directly annotate with `x : np.ndarray` > What can be done with Transonic is described in these pages: > https://transonic.readthedocs.io/en/latest/examples/type_hints.html and > https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html > > I think it would be good to be able to do things like that with > numpy.typing. It may be already possible but I can't find how in the doc. > Two things that are still work-in-progress are annotating arrays with dtypes and with shapes. Your examples already have that, so that's useful input. For C/F-contiguity, I believe that's useful but normally shouldn't show up in user-facing APIs (only in internal helper routines) so probably less urgent. For dtype annotations, a lot of work is being done at the moment by Bas van Beek. Example: https://github.com/numpy/numpy/pull/18128. That all turns out to be quite complex, because there's so many valid ways of specifying a dtype. It's the same kind of flexibility problem as with `asarray` - the complexity is needed to correctly type current code in NumPy, SciPy et al., but it's not what you want for an accelerator. For that you'd want to accept only one way of spelling this, `dtype=`. > I can give few examples here. First very simple: > > from transonic import Array > > Af3d = Array[float, "3d"] > > # Note that this can also be written without Array just as > Af3d = "float[:,:,:]" > > # same thing but only contiguous C ordered > Af3d = Array[float, "3d", "C"] > > Note: being able to limit the compilation just for C-aligned arrays is > very important since it can drastically decrease the compilation > time/memory and that some numerical kernels are anyway written to be > efficient only with C (or Fortran) ordered arrays. > > # 2d color image > A_im = Array[np.int16, "[:,:,3]"] > > Now, fused types. This example is taken from a real life case ( > https://foss.heptapod.net/fluiddyn/fluidsim/-/blob/branch/default/fluidsim/base/time_stepping/pseudo_spect.py) > so it's really useful in practice. > Yes definitely useful, there's also a lot of Cython code in downstream libraries that shows this. Annotations for fused types, when dtypes are just type literals, should hopefully work out of the box with TypeVar without us having to do anything special in numpy. Cheers, Ralf > from transonic import Type, NDim, Array, Union > > N = NDim(2, 3, 4) > A = Array[np.complex128, N, "C"] > Am1 = Array[np.complex128, N - 1, "C"] > > N123 = NDim(1, 2, 3) > A123c = Array[np.complex128, N123, "C"] > A123f = Array[np.float64, N123, "C"] > > T = Type(np.float64, np.complex128) > A1 = Array[T, N, "C"] > A2 = Array[T, N - 1, "C"] > ArrayDiss = Union[A1, A2] > > To summarize, type annotations are and will also be used for Python-Numpy > accelerators. It would be good to also consider this application when > designing numpy.typing. > > Cheers, > Pierre > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface
Hi Lev, Am Di., 16. Feb. 2021 um 11:50 Uhr schrieb Lev Maximov : > > I've reproduced the error you've described and got rid of it without valgrind. > Those two lines are enough to avoid the segfault. Okay, good to know, I'll try it! Thanks for looking into it. > But feel free to find it yourself :) Yes :-D Best wishes, Friedrich ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function
I'm getting a generally lukewarm not negative response. Should we put it to a vote? - Joe On Fri, Feb 12, 2021, 16:06 Robert Kern wrote: > On Fri, Feb 12, 2021 at 3:42 PM Ralf Gommers > wrote: > >> >> On Fri, Feb 12, 2021 at 9:21 PM Robert Kern >> wrote: >> >>> On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers >>> wrote: >>> On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg < sebast...@sipsolutions.net> wrote: > > Right, my initial feeling it that without such context `atleast_3d` is > pretty surprising. So I wonder if we can design `atleast_nd` in a way > that it is explicit about this context. > Agreed. I think such a use case is probably too specific to design a single function for, at least in such a hardcoded way. >>> >>> That might be an argument for not designing a new one (or at least not >>> giving it such a name). Not sure it's a good argument for removing a >>> long-standing one. >>> >> >> I agree. I'm not sure deprecating is best. But introducing new >> functionality where `nd(pos=3) != 3d` is also not great. >> >> At the very least, atleast_3d should be better documented. It also is >> telling that Juan (a long-time) scikit-image dev doesn't like atleast_3d >> and there's very little usage of it in scikit-image. >> > > I'm fairly neutral on atleast_nd(). I think that for n=1 and n=2, you can > derive The One Way to Do It from broadcasting semantics, but for n>=3, I'm > not sure there's much value in trying to systematize it to a single > convention. I think that once you get up to those dimensions, you start to > want to have domain-specific semantics. I do agree that, in retrospect, > atleast_3d() probably should have been named more specifically. It was of a > piece of other conveniences like dstack() that did special things to > support channel-last images (and implicitly treat 3D arrays as such). For > example, DL frameworks that assemble channeled images into minibatches > (with different conventions like BHWC and BCHW), you'd want the n=4 > behavior to do different things. I _think_ you'd just want to do those with > different functions than a complicated set of arguments to one function. > > -- > Robert Kern > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface
On Tue, 2021-02-16 at 12:40 +0100, Friedrich Romstedt wrote: > Hi Lev, > > Am Di., 16. Feb. 2021 um 11:50 Uhr schrieb Lev Maximov < > lev.maxi...@gmail.com>: > > > > I've reproduced the error you've described and got rid of it > > without valgrind. > > Those two lines are enough to avoid the segfault. > > Okay, good to know, I'll try it! Thanks for looking into it. Yeah, sorry if I was too fuzzy. Your error was random, and checking valgrind in that case is often helpful and typically quick (it runs slow, but not much preparation needed). Especially because you reported it succeeding sometimes, where "uninitialized" might help, although I guess a `gdb` backtrace in the crash case might have been just as clear. With debugging symbols in Python (a full debug build makes sense), it mentioned "suboffsets" in a function name for me (maybe when a crash happened), a debug Python will also default to a debug malloc: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONMALLOC Which would not have been very useful here, but could be if you access a Python object after it was free'd for example. Uninitialized + "suboffsets" seemed fairly clear, but I may have underestimated it alot because I recognize "suboffsets" for buffers immediately. Cheers, Sebastian > > > But feel free to find it yourself :) > > Yes :-D > > Best wishes, > Friedrich > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] What to do about structured string dtype and string regression?
Hi all, In https://github.com/numpy/numpy/issues/18407 it was reported that there is a regression for `np.array()` and friends in NumPy 1.20 for code such as: np.array(["1234"], dtype=("U1", 4)) # NumPy 1.20: array(['1', '1', '1', '1'], dtype='>> np.array(["1234"], dtype="(4)U1,i") array([(['1', '1', '1', '1'], 1234)], dtype=[('f0', '>> np.array("1234", dtype="(4)U1,") # Numpy 1.20: array(['1', '1', '1', '1'], dtype='>> np.array(["12"],dtype=("(2,2)U1,")) array([[['1', '1'], ['2', '2']]], dtype=' signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function
On Tue, Feb 16, 2021, at 07:49, Joseph Fox-Rabinovitz wrote: > I'm getting a generally lukewarm not negative response. Should we put it to a > vote? Things here don't typically get decided by vote—I think you'll have to build towards consensus. It may be overkill to write a NEP, but outlining a proposed solution along with pros and cons and getting everyone on board is necessary. The API surface is a touchy issue, and so it is difficult to get new features like these added. Ralf has been working towards this idea, but having a well-organised namespace of utility functions outside of the core NumPy API would be helpful in allowing expansion and experimentation, without making the current situation worse (where we effectively have to support things forever). As an example, take Cartesian product [0] and array combinations [1], which have been requested several times on StackOverflow, but there's nowhere to put them. Stéfan [0] https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays#comment22769580_1235363 [1] https://stackoverflow.com/questions/16003217/n-d-version-of-itertools-combinations-in-numpy/16008578#16008578___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What to do about structured string dtype and string regression?
On Tue, Feb 16, 2021 at 3:13 PM Sebastian Berg wrote: > Hi all, > > In https://github.com/numpy/numpy/issues/18407 it was reported that > there is a regression for `np.array()` and friends in NumPy 1.20 for > code such as: > > np.array(["1234"], dtype=("U1", 4)) > # NumPy 1.20: array(['1', '1', '1', '1'], dtype=' # NumPy 1.19: array(['1', '2', '3', '4'], dtype=' > > The Basics > -- > > This happens when you ask for a rare "subarray" dtype, ways to create > it are: > > np.dtype(("U1", 4)) > np.dtype("(4)U1,") # (does not have a field, only a subarray) > > Both of which give the same subarray dtype a "U1" dtype with shape 4. > One thing to know about these dtypes is that they cannot be attached to > an array: > > np.zeros(3, dtype="(4)U1,").dtype == "U1" > np.zeros(3, dtype="(4)U1,").shape == (3, 4) > > I.e. the shape is moved/added into the array itself (instead of > remaining part of the dtype). > > The Change > -- > > Now what/why did something change? When filling subarray dtypes, NumPy > normally fills every element with the same input. In the above case in > most cases NumPy will give the 1.20 result because it assigns "1234" to > every subarray element individually; maybe confusingly, this truncates > so that only the "1" is actually assigned, we can proof it with a > structured dtype (same result in 1.19 and 1.20): > > >>> np.array(["1234"], dtype="(4)U1,i") > array([(['1', '1', '1', '1'], 1234)], > dtype=[('f0', ' > Another, weirder case which changed (more obviously for the better is: > > >>> np.array("1234", dtype="(4)U1,") > # Numpy 1.20: array(['1', '1', '1', '1'], dtype=' # NumPy 1.19: array(['1', '', '', ''], dtype=' > And, to point it out, we can have subarrays that are not 1-D: > > >>> np.array(["12"],dtype=("(2,2)U1,")) > array([[['1', '1'], > ['2', '2']]], dtype=' > > The Cause > - > > The cause of the 1.19 behaviour is two-fold: > > 1. The "subarray" part of the dtype is moved into the array after the > dimension is found. At this point strings are always considered > "scalars". In most above examples, the new array shape is (1,)+(4,). > > 2. When filling the new array with values, it now has an _additional_ > dimension! Because of this, the string is now suddenly considered a > sequence, so it behaves the same as if `list("1234")`. Although, > normally, NumPy would never consider a string a sequence. > > > The Solution? > - > > I honestly don't have one. We can consider strings as sequences in > this weird special case. That will probably create other weird special > cases, but they would be even more hidden (I expect mainly odder things > throwing an error). > > Should we try to document this better in the release notes or can we > think of some better (or at least louder) solution? > There are way too many unsafe assumptions in this example. It's an edge case of an edge case. I don't think we should be beholden to continuing to support this behavior, which was obviously never anticipated. If there was a way to raise a warning or error in potentially ambiguous situations like this, I would support it. > Cheers, > > Sebastian > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy Community Meeting Wednesday
Hi all, There will be a NumPy Community meeting Wednesday February 17th at 12pm Pacific Time (20:00 UTC). Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion