mask generation

munroesj at gcc dot gnu.org via Gcc-bugs Sat, 26 Oct 2024 11:32:34 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117007


--- Comment #6 from Steven Munroe <munroesj at gcc dot gnu.org> ---
I am starting to see pattern and wonder if the compiler is confused by assuming
the sihft count must match the width/type of the shift/rotate target.

This is implied all the way back to the Altivec-PIM and the current Intrinsic
Reference and the GCC documentation. The intrinsics vec_rl(), vec_sl(),
vec_sr(), vec_sra() all require that the shift-count be the same (unsigned)
type (element size) as the shifted/rotated a value.

This might confuse the compiler into thinking it MUST properly (zero/sign)
extend any shift count. But that is wrong.

But the PowerISA only requires that the shift-count in the (3-7-bits) low-order
bits of each element. And any high-order element bits are don't care.

So the shift-count (operand b) could easily be a vector unsigned char (byte
elements).
In fact the vec_sll(), vec_slo(), vec_srl(), and vec_sro() allow this.

So the compiler can correctly use vspltisb, vspltish, vspltisw, xxspltib, for
any vector shift/rotate where the shift-count is a compiler time constant.

The is always less and faster code then loading vector constants from .rodata.

[Bug target/117007] Poor optimization for small vector constants needed for vector shift/rotate/mask generation

Reply via email to