[Numpy-discussion] Introducing Arm Optimized Routines

2022-11-08 Thread Chris Sidebottom
Hello,

Here at Arm, we've been investigating how we can improve performance on 
AArch64. One way in which we can improve performance is by integrating some 
existing optimized routines 
(https://github.com/ARM-software/optimized-routines), similar to the SVML 
methods for AVX512 that are currently included as a git submodule. Our intent 
is to include the optimized routines repository as an additional submodule 
which we can then use to provide routines on AArch64 for ASIMD, SVE and beyond.

Currently, we're targeting 4-ULP as this aligns with libmvec 
(https://sourceware.org/glibc/wiki/libmvec) and the SVML integration 
(https://github.com/numpy/numpy/pull/19478). This is alongside adding 
sufficient error handling to pass the Numpy test suite, meeting the test 
requirements highlighted in the SVML integration 
(https://github.com/numpy/numpy/pull/19478#issuecomment-893001722).

We've already started curating the necessary functions, let us know if you have 
any feedback.

Cheers,
Chris

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing Arm Optimized Routines

2022-11-09 Thread Chris Sidebottom
Hi Matti,

Thanks for your questions :-)

> This seems like it would improve performance on aarch64. Would the routines 
> also work with the Apple silicon?

Yip, I can't see a reason why that wouldn't be the case.

> If these are new routines, it would be better to implement them in terms of 
> the numpy universal intrinsics rather than adding a new submodule.

These would be the same routines as seen in SVML (integrated here: 
https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_umath_fp.dispatch.c.src#L67),
 which use the universal intrinsics before using the SVML library, the actual 
surface area is minimal so I'd propose we follow a similar path with our 
existing routines and then aim to apply universal intrinsics if that's possible 
in the future - does that sound like a good approach?

Cheers,
Chris
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing Arm Optimized Routines

2023-02-08 Thread Chris Sidebottom
Hello again :-) 

Just as an update for the list, the first PR has now been raised to integrate 
Optimized Routines, demonstrating the performance improvements (sometimes 2x 
faster):
https://github.com/numpy/numpy/pull/23171

Once we've achieved the initial milestone of getting these routines integrated 
and the performance improved it would be interesting to understand what's 
required to translate them into universal intrinsics? I notice that SVE support 
(https://github.com/numpy/numpy/pull/22265) isn't quite ready for universal 
intrinsics which would lead me to believe we would need to use the library 
there either way?

Cheers,
Chris
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

2023-05-31 Thread Chris Sidebottom
Matthew Brett wrote:
> Hi,
> On Wed, May 31, 2023 at 8:40 AM Matti Picus matti.pi...@gmail.com wrote:
> > On 31/5/23 09:33, Jerome Kieffer wrote:
> > Hi Sebastian,
> > I had a quick look at the PR and it looks like you re-implemented the 
> > sin-cos
> > function using SIMD.
> > I wonder how it compares with SLEEF (header only library,
> > CPU-architecture agnostic SIMD implementation of transcendental
> > functions with precision validation). SLEEF is close to the Intel SVML
> > library in spirit  but extended to multi-architecture (tested on PowerPC
> > and ARM for example).
> > This is just curiosity ...
> > Like Juan, I am afraid of this change since my code, which depends on
> > numpy for sin/cos used for rotation is likely to see large change of
> > behavior.
> > Cheers,
> > Jerome
> > I think we should revert the changes. They have proved to be disruptive,
> > and I am not sure the improvement is worth the cost.
> > The reversion should add  a test that cements the current user expectations.
> > The path forward is a different discussion, but for the 1.25 release I
> > think we should revert.
> > Is there a way to make the changes opt-in for now, while we go back to
> see if we can improve the precision?

This would be similar to the approach libmvec is taking 
(https://sourceware.org/glibc/wiki/libmvec), adding the `--disable-mathvec` 
option, although they favour the 4ULP variants rather than the higher accuracy 
ones by default. If someone can advise as to the most appropriate place for 
such a toggle I can look into adding it, I would prefer for the default to be 
4ULP to match libc though.

Cheers,
Chris
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

2023-05-31 Thread Chris Sidebottom
Ralf Gommers wrote:
> On Wed, May 31, 2023 at 12:28 PM Chris Sidebottom chris.sidebot...@arm.com
> wrote:
> > Matthew Brett wrote:
> > Hi,
> > On Wed, May 31, 2023 at 8:40 AM Matti Picus matti.pi...@gmail.com wrote:
> > On 31/5/23 09:33, Jerome Kieffer wrote:
> > Hi Sebastian,
> > I had a quick look at the PR and it looks like you re-implemented the
> > sin-cos
> > function using SIMD.
> > I wonder how it compares with SLEEF (header only library,
> > CPU-architecture agnostic SIMD implementation of transcendental
> > functions with precision validation). SLEEF is close to the Intel SVML
> > library in spirit  but extended to multi-architecture (tested on
> > PowerPC
> > and ARM for example).
> > This is just curiosity ...
> > Like Juan, I am afraid of this change since my code, which depends on
> > numpy for sin/cos used for rotation is likely to see large change of
> > behavior.
> > Cheers,
> > Jerome
> > I think we should revert the changes. They have proved to be
> > disruptive,
> > and I am not sure the improvement is worth the cost.
> > The reversion should add  a test that cements the current user
> > expectations.
> > The path forward is a different discussion, but for the 1.25 release I
> > think we should revert.
> > Is there a way to make the changes opt-in for now, while we go back to
> > see if we can improve the precision?
> > This would be similar to the approach libmvec is taking (
> > https://sourceware.org/glibc/wiki/libmvec), adding the
> > `--disable-mathvec` option, although they favour the 4ULP variants rather
> > than the higher accuracy ones by default. If someone can advise as to the
> > most appropriate place for such a toggle I can look into adding it, I would
> > prefer for the default to be 4ULP to match libc though.
> > We have a build-time toggle for SVML (`disable-svml` in `meson_options.txt`
> and an `NPY_DISABLE_SVML` environment variable for the distutils build).
> This one should look similar I think - and definitely not separate Python
> API with `np.fastmath` or similar. The flag can then default to the old
> (higher-precision, slower) behavior for <2.0, and the fast version for
> > =2.0 somewhere halfway through the 2.0 development cycle - assuming the
> > tweak in precision that Sebastian suggests is possible will remove the
> worst accuracy impacts that have now been identified.
> The `libmvec` link above is not conclusive it seems to me Chris, given that
> the examples specify that one only gets the faster version with
> `-ffast-math`, hence it's off by default.

Argh, I think you're right and I misread it, the --disable-mathvec is for 
compilation of libc not the actual faster operations which require -ffast-math.

Apologies!

Cheers,
Chris

> Cheers,
> Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com