Ping.

> On 20 Sep 2024, at 11:28, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> For svmul, if one of the operands is a constant vector with a uniform
> power of 2, this patch folds the multiplication to a left-shift by
> immediate (svlsl).
> Because the shift amount in svlsl is the second operand, the order of the
> operands is switched, if the first operand contained the powers of 2. However,
> this switching is not valid for some predications: If the predication is
> _m and the predicate not ptrue, the result of svlsl might not be the
> same as for svmul. Therefore, we do not apply the fold in this case.
> The transform is also not applied to INTMIN for signed integers and to
> constant vectors of 1 (this case is partially covered by constant folding
> already and the missing cases will be addressed by the follow-up patch
> suggested in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html).
> 
> Tests were added in the existing test harness to check the produced assembly
> - when the first or second operand contains the power of 2
> - when the second operand is a vector or scalar (_n)
> - for _m, _z, _x predication
> - for _m with ptrue or non-ptrue
> - for intmin for signed integer types
> - for the maximum power of 2 for signed and unsigned integer types.
> Note that we used 4 as a power of 2, instead of 2, because a recent
> patch optimizes left-shifts by 1 to an add instruction. But since we
> wanted to highlight the change to an lsl instruction we used a higher
> power of 2.
> To also check correctness, runtime tests were added.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
> 
> gcc/
> * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
> Implement fold to svlsl for power-of-2 operands.
> 
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/asm/mul_s8.c: New test.
> * gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
> * gcc.target/aarch64/sve/mul_const_run.c: Likewise.
> <0001-SVE-intrinsics-Fold-svmul-with-constant-power-of-2-o.patch>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to