For svmul, if one of the operands is a constant vector with a uniform power of 2, this patch folds the multiplication to a left-shift by immediate (svlsl). Because the shift amount in svlsl is the second operand, the order of the operands is switched, if the first operand contained the powers of 2. However, this switching is not valid for some predications: If the predication is _m and the predicate not ptrue, the result of svlsl might not be the same as for svmul. Therefore, we do not apply the fold in this case. The transform is also not applied to INTMIN for signed integers and to constant vectors of 1 (this case is partially covered by constant folding already and the missing cases will be addressed by the follow-up patch suggested in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html).
Tests were added in the existing test harness to check the produced assembly - when the first or second operand contains the power of 2 - when the second operand is a vector or scalar (_n) - for _m, _z, _x predication - for _m with ptrue or non-ptrue - for intmin for signed integer types - for the maximum power of 2 for signed and unsigned integer types. Note that we used 4 as a power of 2, instead of 2, because a recent patch optimizes left-shifts by 1 to an add instruction. But since we wanted to highlight the change to an lsl instruction we used a higher power of 2. To also check correctness, runtime tests were added. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold): Implement fold to svlsl for power-of-2 operands. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/mul_s8.c: New test. * gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise. * gcc.target/aarch64/sve/mul_const_run.c: Likewise.
0001-SVE-intrinsics-Fold-svmul-with-constant-power-of-2-o.patch
Description: Binary data
smime.p7s
Description: S/MIME cryptographic signature