[Numpy-discussion] Introducing Arm Optimized Routines
Hello, Here at Arm, we've been investigating how we can improve performance on AArch64. One way in which we can improve performance is by integrating some existing optimized routines (https://github.com/ARM-software/optimized-routines), similar to the SVML methods for AVX512 that are currently included as a git submodule. Our intent is to include the optimized routines repository as an additional submodule which we can then use to provide routines on AArch64 for ASIMD, SVE and beyond. Currently, we're targeting 4-ULP as this aligns with libmvec (https://sourceware.org/glibc/wiki/libmvec) and the SVML integration (https://github.com/numpy/numpy/pull/19478). This is alongside adding sufficient error handling to pass the Numpy test suite, meeting the test requirements highlighted in the SVML integration (https://github.com/numpy/numpy/pull/19478#issuecomment-893001722). We've already started curating the necessary functions, let us know if you have any feedback. Cheers, Chris IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing Arm Optimized Routines
Hi Matti, Thanks for your questions :-) > This seems like it would improve performance on aarch64. Would the routines > also work with the Apple silicon? Yip, I can't see a reason why that wouldn't be the case. > If these are new routines, it would be better to implement them in terms of > the numpy universal intrinsics rather than adding a new submodule. These would be the same routines as seen in SVML (integrated here: https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_umath_fp.dispatch.c.src#L67), which use the universal intrinsics before using the SVML library, the actual surface area is minimal so I'd propose we follow a similar path with our existing routines and then aim to apply universal intrinsics if that's possible in the future - does that sound like a good approach? Cheers, Chris ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing Arm Optimized Routines
Hello again :-) Just as an update for the list, the first PR has now been raised to integrate Optimized Routines, demonstrating the performance improvements (sometimes 2x faster): https://github.com/numpy/numpy/pull/23171 Once we've achieved the initial milestone of getting these routines integrated and the performance improved it would be interesting to understand what's required to translate them into universal intrinsics? I notice that SVE support (https://github.com/numpy/numpy/pull/22265) isn't quite ready for universal intrinsics which would lead me to believe we would need to use the library there either way? Cheers, Chris ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Precision changes to sin/cos in the next release?
Matthew Brett wrote: > Hi, > On Wed, May 31, 2023 at 8:40 AM Matti Picus matti.pi...@gmail.com wrote: > > On 31/5/23 09:33, Jerome Kieffer wrote: > > Hi Sebastian, > > I had a quick look at the PR and it looks like you re-implemented the > > sin-cos > > function using SIMD. > > I wonder how it compares with SLEEF (header only library, > > CPU-architecture agnostic SIMD implementation of transcendental > > functions with precision validation). SLEEF is close to the Intel SVML > > library in spirit but extended to multi-architecture (tested on PowerPC > > and ARM for example). > > This is just curiosity ... > > Like Juan, I am afraid of this change since my code, which depends on > > numpy for sin/cos used for rotation is likely to see large change of > > behavior. > > Cheers, > > Jerome > > I think we should revert the changes. They have proved to be disruptive, > > and I am not sure the improvement is worth the cost. > > The reversion should add a test that cements the current user expectations. > > The path forward is a different discussion, but for the 1.25 release I > > think we should revert. > > Is there a way to make the changes opt-in for now, while we go back to > see if we can improve the precision? This would be similar to the approach libmvec is taking (https://sourceware.org/glibc/wiki/libmvec), adding the `--disable-mathvec` option, although they favour the 4ULP variants rather than the higher accuracy ones by default. If someone can advise as to the most appropriate place for such a toggle I can look into adding it, I would prefer for the default to be 4ULP to match libc though. Cheers, Chris ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Precision changes to sin/cos in the next release?
Ralf Gommers wrote: > On Wed, May 31, 2023 at 12:28 PM Chris Sidebottom chris.sidebot...@arm.com > wrote: > > Matthew Brett wrote: > > Hi, > > On Wed, May 31, 2023 at 8:40 AM Matti Picus matti.pi...@gmail.com wrote: > > On 31/5/23 09:33, Jerome Kieffer wrote: > > Hi Sebastian, > > I had a quick look at the PR and it looks like you re-implemented the > > sin-cos > > function using SIMD. > > I wonder how it compares with SLEEF (header only library, > > CPU-architecture agnostic SIMD implementation of transcendental > > functions with precision validation). SLEEF is close to the Intel SVML > > library in spirit but extended to multi-architecture (tested on > > PowerPC > > and ARM for example). > > This is just curiosity ... > > Like Juan, I am afraid of this change since my code, which depends on > > numpy for sin/cos used for rotation is likely to see large change of > > behavior. > > Cheers, > > Jerome > > I think we should revert the changes. They have proved to be > > disruptive, > > and I am not sure the improvement is worth the cost. > > The reversion should add a test that cements the current user > > expectations. > > The path forward is a different discussion, but for the 1.25 release I > > think we should revert. > > Is there a way to make the changes opt-in for now, while we go back to > > see if we can improve the precision? > > This would be similar to the approach libmvec is taking ( > > https://sourceware.org/glibc/wiki/libmvec), adding the > > `--disable-mathvec` option, although they favour the 4ULP variants rather > > than the higher accuracy ones by default. If someone can advise as to the > > most appropriate place for such a toggle I can look into adding it, I would > > prefer for the default to be 4ULP to match libc though. > > We have a build-time toggle for SVML (`disable-svml` in `meson_options.txt` > and an `NPY_DISABLE_SVML` environment variable for the distutils build). > This one should look similar I think - and definitely not separate Python > API with `np.fastmath` or similar. The flag can then default to the old > (higher-precision, slower) behavior for <2.0, and the fast version for > > =2.0 somewhere halfway through the 2.0 development cycle - assuming the > > tweak in precision that Sebastian suggests is possible will remove the > worst accuracy impacts that have now been identified. > The `libmvec` link above is not conclusive it seems to me Chris, given that > the examples specify that one only gets the faster version with > `-ffast-math`, hence it's off by default. Argh, I think you're right and I misread it, the --disable-mathvec is for compilation of libc not the actual faster operations which require -ffast-math. Apologies! Cheers, Chris > Cheers, > Ralf ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com