Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Soumya AR Wed, 02 Oct 2024 22:38:28 -0700

Hi Saurabh,

> On 30 Sep 2024, at 10:36 PM, Saurabh Jha <saurabh....@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Soumya,
> 
> Thank you for the patch. Two clarifications:
> 
> In the instruction pattern's output string, why did you add the 'Z'
> prefix before operands? (%0 -> %Z0).


The ‘Z’ prefix is added to ensure that the register name is correctly printed.

Normally, %n by default prints the register assigned to operand 'n' using
PRINT_OPERAND target hook, as mentioned here:

https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html

Using %<char>n overrides that target hook for customizing the print name. 

In most cases, overriding is not necessary since the operands are SVE registers
but in this case, since non-SVE registers are used at RTL level, we need to
override the hook to ensure that something like:

fscale v0.s, p7/m, v0.s, v31.s

does not get printed.
 
> Also, maybe you can make your test cases more precise by specifying
> which functions generate which instructions. I don't have and SVE test
> off the top of my head but have a look at
> /gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c
> for example.

Thanks for the suggestion! I'll update the test case accordingly.

Regards,
Soumya

> Regards,
> Saurabh
> 
> 
> 
> On 9/30/2024 5:26 PM, Soumya AR wrote:
>> This patch uses the FSCALE instruction provided by SVE to implement the
>> standard ldexp family of functions.
>> 
>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
>> following code:
>> 
>> float
>> test_ldexpf (float x, int i)
>> {
>>      return __builtin_ldexpf (x, i);
>> }
>> 
>> double
>> test_ldexp (double x, int i)
>> {
>>      return __builtin_ldexp(x, i);
>> }
>> 
>> GCC Output:
>> 
>> test_ldexpf:
>>      b ldexpf
>> 
>> test_ldexp:
>>      b ldexp
>> 
>> Since SVE has support for an FSCALE instruction, we can use this to process
>> scalar floats by moving them to a vector register and performing an fscale 
>> call,
>> similar to how LLVM tackles an ldexp builtin as well.
>> 
>> New Output:
>> 
>> test_ldexpf:
>>      fmov s31, w0
>>      ptrue p7.b, all
>>      fscale z0.s, p7/m, z0.s, z31.s
>>      ret
>> 
>> test_ldexp:
>>      sxtw x0, w0
>>      ptrue p7.b, all
>>      fmov d31, x0
>>      fscale z0.d, p7/m, z0.d, z31.d
>>      ret
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64-sve.md
>> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
>> floating modes and expand to the existing pattern for FSCALE.
>> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
>> operands as well as scalar floating modes.
>> 
>> * config/aarch64/iterators.md:
>> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
>> as SF and DF.
>> VPRED: Extended the attribute to handle GPF modes.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/aarch64/sve/fscale.c: New test.
>> 
>

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

Reply via email to