[Numpy-discussion] Re: Curious performance different with np.unique on arrays of characters

2023-09-14 Thread Devulapalli, Raghuveer
What processor you are running this on? np.sort uses AVX-512 accelerated 
sorting for np.int32, so just wondering if you that is the reason for this 
difference.

Raghuveer 

> -Original Message-
> From: sal...@caltech.edu 
> Sent: Wednesday, September 13, 2023 6:14 PM
> To: numpy-discussion@python.org
> Subject: [Numpy-discussion] Curious performance different with np.unique on
> arrays of characters
> 
> Hello -
> 
> In the course of some genomics simulations, I seem to have come across a 
> curious
> (to me at least) performance difference in np.unique that I wanted to share. 
> (If
> this is not the right forum for this, please let me know!)
> 
> With a np.array of characters (U1), np.unique seems to be much faster when
> doing np.view as int -> np.unique -> np.view as U1 for arrays of decent size. 
> I
> would not have expected this since np.unique knows what's coming in as S1 and
> could handle the view-stuff internally. I've played with this a number of 
> ways (e.g.
> S1 vs U1; int32 vs int64; return_counts = True vs False; 100, 1000, or 10k
> elements) and seem to notice the same pattern. A short illustration below with
> U1, int32, return_counts = False, 10 vs 10k.
> 
> I wonder if this is actually intended behavior, i.e. the view-stuff is 
> actually a good
> idea for the user to think about and implement if appropriate for their 
> usecase (as
> it is for me).
> 
> Best regards,
> Shyam
> 
> 
> import numpy as np
> 
> charlist_10 = np.array(list('ASDFGHJKLZ'), dtype='U1') charlist_10k =
> np.array(list('ASDFGHJKLZ' * 1000), dtype='U1')
> 
> def unique_basic(x):
> return np.unique(x)
> 
> def unique_view(x):
> return np.unique(x.view(np.int32)).view(x.dtype)
> 
> In [27]: %timeit unique_basic(charlist_10)
> 2.17 µs ± 40.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> In [28]: %timeit unique_view(charlist_10)
> 2.53 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> In [29]: %timeit unique_basic(charlist_10k)
> 204 µs ± 4.61 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
> 
> In [30]: %timeit unique_view(charlist_10k)
> 66.7 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
> 
> In [31]: np.__version__
> Out[31]: '1.25.2'
> 
> 
> 
> --
> Shyam Saladi
> https://shyam.saladi.org
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe
> send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: raghuveer.devulapa...@intel.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: welcome Raghuveer, Chris, Mateusz and Matt to the NumPy maintainers team

2024-01-29 Thread Devulapalli, Raghuveer
Thank you everyone. It’s been a pleasure being part of the NumPy community 😊

Raghuveer

From: Hameer Abbasi via NumPy-Discussion 
Sent: Saturday, January 27, 2024 9:20 AM
To: Discussion of Numerical Python 
Cc: Hameer Abbasi 
Subject: [Numpy-discussion] Re: welcome Raghuveer, Chris, Mateusz and Matt to 
the NumPy maintainers team

Welcome, Raghuveer, Chris, Mateusz and Matt!


Am 26.01.2024 um 21:04 schrieb Ralf Gommers 
mailto:ralf.gomm...@gmail.com>>:

Hi all,

We've got four new NumPy maintainers! Welcome to the team, and congratulations 
to:

- Raghuveer Devulapalli (https://github.com/r-devulap)
- Chris Sidebottom (https://github.com/mousius)
- Mateusz Sokół (https://github.com/mtsokol/)
- Matt Haberland (https://github.com/mdhaber)

Raghuveer and Chris have been contribution to the effort on SIMD and 
performance optimizations for quite a while now. Mateusz has done a lot of the 
heavy lifting on the Python API improvements for NumPy 2.0 And Matt has been 
contributing to the test infrastructure and docs.

Thanks to all four of you for the great work to date!

Cheers,
Ralf

___
NumPy-Discussion mailing list -- 
numpy-discussion@python.org
To unsubscribe send an email to 
numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: hameerabb...@yahoo.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Moving the weekly traige/community meetings

2024-04-08 Thread Devulapalli, Raghuveer
+1 

> -Original Message-
> From: Matti Picus 
> Sent: Sunday, April 7, 2024 7:33 PM
> To: Discussion of Numerical Python 
> Subject: [Numpy-discussion] Moving the weekly traige/community meetings
> 
> Could we move the weekly community/triage meetings one hour later? Some
> participants have a permanent conflict, and the current time is inconvenient 
> for
> my current time zone.
> 
> Matti
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe
> send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: raghuveer.devulapa...@intel.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Transcendental Functions

2019-01-16 Thread Devulapalli, Raghuveer
Hello,

Are Transcendental Functions SIMD vectorized in NumPy?

Raghuveer

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

2020-02-04 Thread Devulapalli, Raghuveer
Hi everyone, 

I know had raised these questions in the PR, but wanted to post them in the 
mailing list as well.  

1) Once NumPy adds the framework and initial set of Universal Intrinsic, if 
contributors want to leverage a new architecture specific SIMD instruction, 
will they be expected to add software implementation of this instruction for 
all other architectures too? 

2) On whom does the burden lie to ensure that new implementations are 
benchmarked and shows benefits on every architecture? What happens if 
optimizing an Ufunc leads to improving performance on one architecture and 
worsens performance on another? 

Thanks, 
Raghuveer


-Original Message-
From: NumPy-Discussion 
 On Behalf 
Of Daniele Nicolodi
Sent: Tuesday, February 4, 2020 10:01 AM
To: numpy-discussion@python.org
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On 04-02-2020 08:08, Matti Picus wrote:
> Together with Sayed Adel (cc) and Ralf, I am pleased to put the draft 
> version of NEP 38 [0] up for discussion. As per NEP 0, this is the 
> next step in the community accepting the approach layed out in the 
> NEP. The NEP PR [1] has already garnered a fair amount of discussion 
> about the viability of Universal SIMD Intrinsics, so I will try to 
> capture some of that here as well.

Hello,

more interesting prior art may be found in VOLK https://www.libvolk.org.
VOLK is developed mainly to be used in GNURadio, and this reflects in the 
available kernels and in the supported data types, I think the approach used 
there may be of interest.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

2020-02-11 Thread Devulapalli, Raghuveer
>> I think this doesn't quite answer the question. If I understand correctly, 
>> it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing 
>> from the  supported AVX512 instructions in master). I think the answer is 
>> yes, it needs to be added for other architectures as well.

That adds a lot of overhead to write SIMD based optimizations which can 
discourage contributors. It’s also an unreasonable expectation that a developer 
be familiar with SIMD of all the architectures. On top of that the performance 
implications aren’t clear. Software implementations of hardware instructions 
might perform worse and might not even produce the same result.

From: NumPy-Discussion 
 On Behalf 
Of Ralf Gommers
Sent: Monday, February 10, 2020 9:17 PM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics



On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi 
mailto:einstein.edi...@gmail.com>> wrote:
—snip—

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if 
> contributors want to leverage a new architecture specific SIMD instruction, 
> will they be expected to add software implementation of this instruction for 
> all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot 
add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  
However, I would not expect one person or team to be an expert in all 
assemblies, so intrinsics for one architecture can be developed independently 
of another.

I think this doesn't quite answer the question. If I understand correctly, it's 
about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the 
supported AVX512 instructions in master). I think the answer is yes, it needs 
to be added for other architectures as well. Otherwise, if universal intrinsics 
are added ad-hoc and there's no guarantee that a universal instruction is 
available for all main supported platforms, then over time there won't be much 
that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation. I 
would expect accelerating ufuncs via intrinsics that are already supported to 
be much more common than having to add new intrinsics. Does that sound right?


> 2) On whom does the burden lie to ensure that new implementations are 
> benchmarked and shows benefits on every architecture? What happens if 
> optimizing an Ufunc leads to improving performance on one architecture and 
> worsens performance on another?

This is slightly hard to provide a recipe for. I suspect it may take a while 
before this becomes an issue, since we don't have much SIMD code to begin with. 
So adding new code with benchmarks will likely show improvements on all 
architectures (we should ensure benchmarks can be run via CI, otherwise it's 
too onerous). And if not and it's not easily fixable, the problematic platform 
could be skipped so performance there is unchanged.

Only once there's existing universal intrinsics and then they're tweaked will 
we have to be much more careful I'd think.

Cheers,
Ralf



I would look at this from a maintainability point of view. If we are increasing 
the code size by 20% for a certain ufunc, there must be a domonstrable 20% 
increase in performance on any CPU. That is to say, micro-optimisation will be 
unwelcome, and code readability will be preferable. Usually we ask the 
submitter of the PR to test the PR with a machine they have on hand, and I 
would be inclined to keep this trend of self-reporting. Of course, if someone 
else came along and reported a performance regression of, say, 10%, then we 
have increased code by 20%, with only a net 5% gain in performance, and the PR 
will have to be reverted.

—snip—
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

2020-02-12 Thread Devulapalli, Raghuveer
>> I hope there will not be a demand to use many non-universal intrinsics in 
>> ufuncs, we will need to work this out on a case-by-case basis in each ufunc. 
>> Does that sound reasonable? Are there intrinsics you have already used that 
>> have no parallel on other platforms?

I think that is reasonable. It's hard to anticipate the future need and benefit 
of specialized intrinsics but I tried to make a list of some of the specialized 
intrinsics that are currently in use in NumPy that I don’t believe exist on 
other platforms (most of these actually don’t exist on AVX2 either). I am not 
an expert in ARM or VSX architecture, so please correct me if I am wrong. 

a. _mm512_mask_i32gather_ps
b. _mm512_mask_i32scatter_ps/_mm512_mask_i32scatter_pd
c. _mm512_maskz_loadu_pd/_mm512_maskz_loadu_ps
d. _mm512_getexp_ps
e. _mm512_getmant_ps
f. _mm512_scalef_ps
g. _mm512_permutex2var_ps, _mm512_permutex2var_pd
h. _mm512_maskz_div_ps, _mm512_maskz_div_pd
i. _mm512_permute_ps/_mm512_permute_pd 
j. _mm512_sqrt_ps/pd (I could be wrong on this one, but from the little google 
search I did, it seems like power ISA doesn’t have a vectorized sqrt 
instruction)

Software implementations of these instructions is definitely possible. But some 
of them are not trivial to implement and are surely not going to be one line 
macro's either. I am also unsure of what implications this has on performance, 
but we will hopefully find out once we convert these to universal intrinsic and 
then benchmark. 

Raghuveer

-Original Message-
From: NumPy-Discussion 
 On Behalf 
Of Matti Picus
Sent: Tuesday, February 11, 2020 11:19 PM
To: numpy-discussion@python.org
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
>
> On top of that the performance implications aren’t clear. Software 
> implementations of hardware instructions might perform worse and might 
> not even produce the same result.
>

The proposal for universal intrinsics does not enable replacing an intrinsic on 
one platform with a software emulation on another: the intrinsics are meant to 
be compile-time defines that overlay the universal intrinsic with a platform 
specific one. In order to use a new intrinsic, it must have parallel intrinsics 
on the other platforms, or cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" 
will always return false so the compiler will not even build a loop for that 
platform. I will try to clarify that intention in the NEP.


I hope there will not be a demand to use many non-universal intrinsics in 
ufuncs, we will need to work this out on a case-by-case basis in each ufunc. 
Does that sound reasonable? Are there intrinsics you have already used that 
have no parallel on other platforms?


Matti

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy team update

2020-06-19 Thread Devulapalli, Raghuveer
Hi Ralf,

Thank you for the acknowledgement. I am happy to contribute and hope to 
continue to do so in the future.

Raghuveer

From: NumPy-Discussion 
 On Behalf 
Of Ralf Gommers
Sent: Thursday, June 18, 2020 2:58 PM
To: Discussion of Numerical Python 
Subject: [Numpy-discussion] NumPy team update

Hi all,

The NumPy team is growing, and it's awesome to see everything that is going on. 
Hard to keep up with, but that's a good thing! I think it's a good time for an 
update on people who gained commit rights, or joined one of the teams we now 
have.

For those who haven't seen it yet, we have a team gallery at 
https://numpy.org/gallery/team.html. It isn't yet updated for the changes in 
this email, but gives a good picture of the current state.

Matti Picus joined the Steering Council. He has been one of the driving forces 
behind NumPy for well over two years now, and we're very glad to have him join 
the council.

Ross Barnowski, Melissa Weber Mendonça, Josh Wilson and Bas van Beek gained 
commit rights. Ross has worked on the docs and reviewed lots of doc PRs for the 
last six months.  Melissa has led the doc structuring and tutorial writing 
effort and has done a good amount of f2py maintenance as well. Josh and Bas 
have been pushing the type annotation work forward, first in the numpy-stubs 
repo and now in master. It's great to have experts in all these topics join the 
team.

Furthermore, we now have 10+ people in the community calls, the triage calls 
and the docs team calls (all bi-weekly and on the NumPy community calendar [1] 
- everyone is welcome). And there's more going on - I feel like I should 
mention some of the other excellent work going on:

A lot of work is going into SIMD optimizations. Sayed Adel has made very nice 
progress on implementing universal intrinsics (NEP 38), and Raghuveer 
Devulapalli, Chunlin Fang and others have contributed SSE/AVX and ARM Neon 
implementations for many functions.

For the website, Shaloo Shalini has continued working on new case studies, 
we're about to merge a really nice one on tracking animal movement. Ben 
Nathanson has contributed his technical writing and editing skills to improve 
our website and documentation content. And Isabela Presedo-Floyd has taken up 
the challenge of redesigning the NumPy logo, and we're nearing the end of the 
process there.

The survey team has also been working hard. Inessa Pawson, Xiaoyi Deng, 
Stephanie Mendoza, Ross Barnowski, Sebastian Berg and a number of volunteers 
for translations are getting a really well-designed survey together.

And then of course there's both old hands and new people doing the regular 
maintenance and enhancement work on the main repo.

Writing this email started with "we just gave out some commit rights, we should 
put that on the mailing list". Then I realized there's lots of other people and 
activities that deserve a shout out. And probably more that I forgot (if so, 
apologies!). I'll stop here - thanks everyone for all you do!

Cheers,
Ralf


[1] 
https://calendar.google.com/calendar?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Re: [RFC] - numpy/SVML appears to be poorly optimized

2021-11-05 Thread Devulapalli, Raghuveer
They are meant to be optimized. Any contribution to improve them further is 
more than welcome. 

Raghuveer

-Original Message-
From: Noah Goldstein  
Sent: Thursday, November 4, 2021 10:46 AM
To: numpy-discussion@python.org
Subject: [Numpy-discussion] [RFC] - numpy/SVML appears to be poorly optimized

The numpy SVML library: https://github.com/numpy/SVML

appears to be poorly optimized. Since its just the raw assembly dump this also 
makes it quite difficult to improve (with either a better compiler or by hand).

Some of the glaring issues are:
1. register allocation / spilling
2. rodata layouts / const-propagation of the values.
3. Very odd use of internal functions that really ought to be inlined.

Are these functions meant to be heavily optimized?

If so, are people open to patches that optimize them (either with new C 
implementations are in the current assembly implementations).
___
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe 
send an email to numpy-discussion-le...@python.org 
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: raghuveer.devulapa...@intel.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Precision changes to sin/cos in the next release?

2023-05-31 Thread Devulapalli, Raghuveer
I wouldn't the discount the performance impact on real world benchmarks for 
these functions. Just to name a couple of examples:


  *   7x speed up of np.exp and np.log results in a 2x speed up of training 
neural networks like logistic regression [1]. I would expect np.tanh will show 
similar results for neural networks.
  *   Vectorizing even simple functions like np.maximum results in a 1.3x speed 
up of sklearn's Kmeans algorithm [2]

Raghuveer

[1] https://github.com/numpy/numpy/pull/13134
[2] https://github.com/numpy/numpy/pull/14867
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Proposal to accept NEP 54: Adopt Google Highway for developing SIMD kernels

2025-03-18 Thread Devulapalli, Raghuveer
Hi all,

Its been more than year since NEP 54 was drafted (see 
https://numpy.org/neps/nep-0054-simd-cpp-highway.html). A PR has been opened to 
change the status of NEP 54 to "Accepted" 
(https://github.com/numpy/numpy/pull/28556). A few details to iron out:


  1.  Static/dynamic dispatch: Highway is currently used only for static 
dispatch (leveraging highway intrisincs) and we still use NumPy/meson 
infrastructure to do the dynamic dispatching in NumPy. Given it has worked well 
for us so far without any issues, I don't see a big motivation to switch to 
Highway's dynamic dispatch. Please voice your opinion if you think otherwise.
  2.  MSVC support: Given that highway support on MSVC is limited and prone to 
bugs, we are not sure if NumPy wants to support building from source with MSVC. 
Several options were discussed on the community meeting which are summarized 
below:
 *   Drop MSVC support entirely and switch to clang-cl on Windows.
 *   Support building with MSVC but disable SIMD kernels written using 
Highway.
 *   Continue supporting MSVC for SIMD kernels with static dispatch with 
the caveat that it could be buggy.



Most people agreed with option (b) and encourage people who build NumPy with 
MSVC to switch to clang-cl, if they care about performance.

Since incorporating Highway into NumPy, the community has successfully ported 
existing C SIMD kernels to highway [1], [2], [3] and a few more are work in 
progress [4], [5], [6].

Raghuveer

[1] https://github.com/numpy/numpy/pull/28368
[2] https://github.com/numpy/numpy/pull/25934
[3] https://github.com/numpy/numpy/pull/25781
[4] https://github.com/numpy/numpy/pull/28490
[5] https://github.com/numpy/numpy/pull/27402
[6] https://github.com/numpy/numpy/pull/26346
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com