On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
 add    z31.b, z31.b, z31.b
instead of:
 lsl    z31.b, z31.b, #1

The implementation of this change for SVE vectors is similar to a prior patch
<https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659958.html> that adds
the above functionality for Neon vectors.

Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1. 

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soum...@nvidia.com>

gcc/ChangeLog:

        * config/aarch64/aarch64-sve.md (*post_ra_v<optab><mode>3): Split 
pattern to
        accomodate left and right shifts separately.
        (*post_ra_v_ashl<mode>3): Matches left shifts with additional 
constraint to
        check for shifts by 1.
        (*post_ra_v_<optab><mode>3): Matches right shifts.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 
with
        corresponding add
        * gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. 
        * gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
        * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
        * gcc.target/aarch64/sve/adr_1.c: Likewise.
        * gcc.target/aarch64/sve/adr_6.c: Likewise.
        * gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
        * gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
        * gcc.target/aarch64/sve/shift_2.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather Likewise.
        * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
        * gcc.target/aarch64/sve/sve_shl_add.c: New test.

Attachment: 0001-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-SVE-.patch
Description: 0001-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-SVE-.patch

Reply via email to