Soumya AR <soum...@nvidia.com> writes: > Changes since v1: > > This revision makes use of the extended definition of aarch64_ptrue_reg to > generate predicate registers with the appropriate set bits. > > Earlier, there was a suggestion to add support for half floats as well. I > extended the patch to include HFs but GCC still emits a libcall for ldexpf16. > For example, in the following case, the call does not lower to fscale: > > _Float16 test_ldexpf16 (_Float16 x, int i) { > return __builtin_ldexpf16 (x, i); > } > > Any suggestions as to why this may be?
You'd need to change: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 2d455938271..469835b1d62 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -441,7 +441,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) /* FP scales. */ -DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) +DEF_INTERNAL_FLT_FLOATN_FN (LDEXP, ECF_CONST, ldexp, binary) /* Ternary math functions. */ DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) A couple of comments below, but otherwise it looks good: > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 0bc98315bb6..7f708ea14f9 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -449,6 +449,9 @@ > ;; All fully-packed SVE floating-point vector modes. > (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF]) > > +;; Fully-packed SVE floating-point vector modes and 32-bit and 64-bit floats. > +(define_mode_iterator SVE_FULL_F_SCALAR [VNx8HF VNx4SF VNx2DF HF SF DF]) The comment is out of date. How about: ;; Fully-packed SVE floating-point vector modes and their scalar equivalents. (define_mode_iterator SVE_FULL_F_SCALAR [SVE_FULL_F GPF_HF]) > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c > b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c > new file mode 100644 > index 00000000000..251b4ef9188 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-Ofast" } */ > + > +float > +test_ldexpf (float x, int i) > +{ > + return __builtin_ldexpf (x, i); > +} > +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, > z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > + > +double > +test_ldexp (double x, int i) > +{ > + return __builtin_ldexp (x, i); > +} > +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, > z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ It would be good to check the ptrues as well, to make sure that we only enable one lane. Thanks, Richard