15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

liuhongt at gcc dot gnu.org via Gcc-bugs Sun, 19 May 2024 18:23:28 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069


--- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Haochen Jiang from comment #10)
> A patch like Comment 8 could definitely solve the problem. But I need to
> test more benchmarks to see if there is surprise.
> 
> But, yes, as Uros said in Comment 9, maybe there is a chance we could do it
> better.

Could you add "arch=skylake-avx512" to target_clones and try disable whole
ix86_expand_vecop_qihi2 to see if there's any performance improvement?
For x86, cross-lane permutation(truncation) is not very efficient(3-4 cycles
for both vpermq and vpmovwb).

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

Reply via email to