[Bug target/119702] PPCLE: Inefficient auto-vectorization for 64-bit shifts on Power9

avinashd at linux dot ibm.com via Gcc-bugs Fri, 01 Aug 2025 05:00:00 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119702


--- Comment #7 from Avinash Jayakar <avinashd at linux dot ibm.com> ---
(In reply to Peter Bergner from comment #6)
> If the vaddudm is the fastest sequence, then yes.

Got it, I referred the power10 user manual to check the latency information,
and vaddudm has better cpi, min latency and max latency, compared to
combination of the 2 instructions. 
So I will work on making this change (using vaddudm in place of splat and
shift). 


> I think you should fire off a git bisect to track down which commit changed
> the behavior from GCC 14 and then you can decide how to fix it.

Is this to check why vextsb2d was being generated? Because apart from that, the
instruction sequence is the same right?

[Bug target/119702] PPCLE: Inefficient auto-vectorization for 64-bit shifts on Power9

Reply via email to