Hi Richard, > On 7 Nov 2024, at 3:19 PM, Richard Sandiford <richard.sandif...@arm.com> > wrote: > > External email: Use caution opening links or attachments > > > Soumya AR <soum...@nvidia.com> writes: >> Changes since v1: >> >> This revision makes use of the extended definition of aarch64_ptrue_reg to >> generate predicate registers with the appropriate set bits. >> >> Earlier, there was a suggestion to add support for half floats as well. I >> extended the patch to include HFs but GCC still emits a libcall for ldexpf16. >> For example, in the following case, the call does not lower to fscale: >> >> _Float16 test_ldexpf16 (_Float16 x, int i) { >> return __builtin_ldexpf16 (x, i); >> } >> >> Any suggestions as to why this may be? > > You'd need to change: > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 2d455938271..469835b1d62 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, > vec_fmaddsub, ternary) > DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > -DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > +DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary) > > /* Ternary math functions. */ > DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
Thanks for this! It works for FP16 now. > > A couple of comments below, but otherwise it looks good: > >> diff --git a/gcc/config/aarch64/iterators.md >> b/gcc/config/aarch64/iterators.md >> index 0bc98315bb6..7f708ea14f9 100644 >> --- a/gcc/config/aarch64/iterators.md >> +++ b/gcc/config/aarch64/iterators.md >> @@ -449,6 +449,9 @@ >> ;; All fully-packed SVE floating-point vector modes. >> (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF]) >> >> +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit >> floats. >> +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF]) > > The comment is out of date. How about: > > ;; Fully-packed SVE floating-point vector modes and their scalar equivalents. > (define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF]) Edited. > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c >> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c >> new file mode 100644 >> index 00000000000..251b4ef9188 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-additional-options "-Ofast" } */ >> + >> +float >> +test_ldexpf (float x, int i) >> +{ >> + return __builtin_ldexpf (x, i); >> +} >> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, >> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> + >> +double >> +test_ldexp (double x, int i) >> +{ >> + return __builtin_ldexp (x, i); >> +} >> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ > > It would be good to check the ptrues as well, to make sure that we only > enable one lane. > Makes sense, I’ve changed the test case to use check-function-bodies instead so we can check for ptrues as well. Best, Soumya > Thanks, > Richard
0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch