Re: [PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Richard Sandiford Thu, 07 Nov 2024 01:49:59 -0800

Soumya AR <soum...@nvidia.com> writes:
> Changes since v1:
>
> This revision makes use of the extended definition of aarch64_ptrue_reg to
> generate predicate registers with the appropriate set bits.
>
> Earlier, there was a suggestion to add support for half floats as well. I
> extended the patch to include HFs but GCC still emits a libcall for ldexpf16.
> For example, in the following case, the call does not lower to fscale:
>
> _Float16 test_ldexpf16 (_Float16 x, int i) {
>       return __builtin_ldexpf16 (x, i);
> }
>
> Any suggestions as to why this may be?


You'd need to change:

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2d455938271..469835b1d62 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, 
vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)

 /* FP scales.  */
-DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
+DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary)

 /* Ternary math functions.  */
 DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)

A couple of comments below, but otherwise it looks good:

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 0bc98315bb6..7f708ea14f9 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -449,6 +449,9 @@
>  ;; All fully-packed SVE floating-point vector modes.
>  (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF])
>  
> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit floats.
> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF])

The comment is out of date.  How about:

;; Fully-packed SVE floating-point vector modes and their scalar equivalents.
(define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF])

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> new file mode 100644
> index 00000000000..251b4ef9188
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +
> +float
> +test_ldexpf (float x, int i)
> +{
> +  return __builtin_ldexpf (x, i);
> +}
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +
> +double
> +test_ldexp (double x, int i)
> +{
> +  return __builtin_ldexp (x, i);
> +} 
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */

It would be good to check the ptrues as well, to make sure that we only
enable one lane.

Thanks,
Richard

Re: [PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Reply via email to