> -----Original Message-----
> From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> On Behalf Of
> Kyrylo Tkachov via Gcc-patches
> Sent: 16 February 2021 15:20
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] aarch64: Add internal tune flag to minimise VL-based scalar
> ops
> 
> Hi all,
> 
> This patch introduces an internal tune flag to break up VL-based scalar ops
> into a GP-reg scalar op with the VL read kept separate. This can be preferable
> on some CPUs.
> 
> I went for a tune param rather than extending the rtx costs as our RTX costs
> tables aren't set up to track
> this intricacy.
> 
> I've confirmed that on the simple loop:
> void vadd (int *dst, int *op1, int *op2, int count)
> {
>   for (int i = 0; i < count; ++i)
>     dst[i] = op1[i] + op2[i];
> }
> 
> we now split the incw into a cntw outside the loop and the add inside.
> 
> +       cntw    x5
> ...
> loop:
> -       incw    x4
> +       add     x4, x4, x5
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> This is a minimally invasive fix to help the performance of just -
> mcpu=neoverse-v1 for
> GCC 11 so I'd like to have it in stage4 if possible,
> but I'd appreciate some feedback on the risk assessment of it.

After some offline discussion and evaluation with Richard Sandiford we are 
happy with the patch to go in now as it's a minimal codegen change isolated to 
a specific non-default tuning, so I've pushed the patch.
Thanks,
Kyrill

> 
> Thanks,
> Kyrill
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
>       Define.
>       * config/aarch64/aarch64.md (add<mode>3): Force
> CONST_POLY_INT immediates
>       into a register when the above is enabled.
>       * config/aarch64/aarch64.c (neoversev1_tunings):
>       AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
>       (aarch64_rtx_costs): Use
> AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
> 
> gcc/testsuite/
> 
>       * gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.

Reply via email to