Re: [PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Soumya AR Mon, 11 Nov 2024 01:05:18 -0800

Hi Richard, 

> On 7 Nov 2024, at 3:19 PM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Soumya AR <soum...@nvidia.com> writes:
>> Changes since v1:
>> 
>> This revision makes use of the extended definition of aarch64_ptrue_reg to
>> generate predicate registers with the appropriate set bits.
>> 
>> Earlier, there was a suggestion to add support for half floats as well. I
>> extended the patch to include HFs but GCC still emits a libcall for ldexpf16.
>> For example, in the following case, the call does not lower to fscale:
>> 
>> _Float16 test_ldexpf16 (_Float16 x, int i) {
>>      return __builtin_ldexpf16 (x, i);
>> }
>> 
>> Any suggestions as to why this may be?
> 
> You'd need to change:
> 
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 2d455938271..469835b1d62 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, 
> vec_fmaddsub, ternary)
> DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> 
> /* FP scales.  */
> -DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> +DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary)
> 
> /* Ternary math functions.  */
> DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)


Thanks for this! It works for FP16 now.

> 
> A couple of comments below, but otherwise it looks good:
> 
>> diff --git a/gcc/config/aarch64/iterators.md 
>> b/gcc/config/aarch64/iterators.md
>> index 0bc98315bb6..7f708ea14f9 100644
>> --- a/gcc/config/aarch64/iterators.md
>> +++ b/gcc/config/aarch64/iterators.md
>> @@ -449,6 +449,9 @@
>> ;; All fully-packed SVE floating-point vector modes.
>> (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
>> 
>> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit 
>> floats.
>> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF])
> 
> The comment is out of date.  How about:
> 
> ;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
> (define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])

Edited.

> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> new file mode 100644
>> index 00000000000..251b4ef9188
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +float
>> +test_ldexpf (float x, int i)
>> +{
>> +  return __builtin_ldexpf (x, i);
>> +}
>> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, 
>> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +
>> +double
>> +test_ldexp (double x, int i)
>> +{
>> +  return __builtin_ldexp (x, i);
>> +}
>> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, 
>> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
> 
> It would be good to check the ptrues as well, to make sure that we only
> enable one lane.
> 
Makes sense, I’ve changed the test case to use check-function-bodies instead so
we can check for ptrues as well.

Best,
Soumya

> Thanks,
> Richard

0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch

Re: [PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Reply via email to