Changes since v1: This revision makes use of the extended definition of aarch64_ptrue_reg to generate predicate registers with the appropriate set bits.
Earlier, there was a suggestion to add support for half floats as well. I extended the patch to include HFs but GCC still emits a libcall for ldexpf16. For example, in the following case, the call does not lower to fscale: _Float16 test_ldexpf16 (_Float16 x, int i) { return __builtin_ldexpf16 (x, i); } Any suggestions as to why this may be? —— This patch uses the FSCALE instruction provided by SVE to implement the standard ldexp family of functions. Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the following code: float test_ldexpf (float x, int i) { return __builtin_ldexpf (x, i); } double test_ldexp (double x, int i) { return __builtin_ldexp(x, i); } GCC Output: test_ldexpf: b ldexpf test_ldexp: b ldexp Since SVE has support for an FSCALE instruction, we can use this to process scalar floats by moving them to a vector register and performing an fscale call, similar to how LLVM tackles an ldexp builtin as well. New Output: test_ldexpf: fmov s31, w0 ptrue p7.b, vl4 fscale z0.s, p7/m, z0.s, z31.s ret test_ldexp: sxtw x0, w0 ptrue p7.b, vl8 fmov d31, x0 fscale z0.d, p7/m, z0.d, z31.d ret The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soum...@nvidia.com> gcc/ChangeLog: PR target/111733 * config/aarch64/aarch64-sve.md (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar floating modes and expand to the existing pattern for FSCALE. (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE operands as well as scalar floating modes. * config/aarch64/iterators.md: (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well as HF, SF, and DF. (VPRED): Extended the attribute to handle GPF_HF modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/fscale.c: New test.
0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch