[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

vineetg at gcc dot gnu.org via Gcc-bugs Tue, 10 Dec 2024 14:35:08 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722


--- Comment #16 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #15)
> (In reply to Vineet Gupta from comment #14)

> > @Robin, it seems the current codegen generates 2 widening ops, which might
> > not be as efficient. We have done some profiling of widening add throughput
> > and Edwin's data tells me that the throughput might not be the same.
> 
> Hmm, would you ever want the widening ops if the throughput is worse then? 
> I.e. if you had a throughput of 2 for simple adds and zexts but 1 for vwadd
> could you not disable them altogether if they "clog" the pipeline?

Right. We need to experiment some more and see how it plays on real hw.

But the point here really here is we don't need the widening semantics, more
twice. The min+max+sub in loops with a final reducing sum should do the trick.

 I'm just going by the data Edwin generated on running microprobes on BPI3 (for
back-back ops). I don't think he has posted that into the public portal yet [1]

[1] https://github.com/ewlu/bp3-microarch

[Bug target/117722] RISC-V: Failed to vectorize x264_pixel_sad_4x4

Reply via email to