[PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Soumya AR Sun, 03 Nov 2024 20:21:20 -0800

Changes since v1:

This revision makes use of the extended definition of aarch64_ptrue_reg to
generate predicate registers with the appropriate set bits.


Earlier, there was a suggestion to add support for half floats as well. I
extended the patch to include HFs but GCC still emits a libcall for ldexpf16.
For example, in the following case, the call does not lower to fscale:

_Float16 test_ldexpf16 (_Float16 x, int i) {
        return __builtin_ldexpf16 (x, i);
}

Any suggestions as to why this may be?

——

This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
        return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
        return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
        b ldexpf

test_ldexp:
        b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
        fmov    s31, w0
        ptrue   p7.b, vl4
        fscale  z0.s, p7/m, z0.s, z31.s
        ret

test_ldexp:
        sxtw    x0, w0
        ptrue   p7.b, vl8
        fmov    d31, x0
        fscale  z0.d, p7/m, z0.d, z31.d
        ret

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soum...@nvidia.com>

gcc/ChangeLog:

        PR target/111733
        * config/aarch64/aarch64-sve.md
        (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar
        floating modes and expand to the existing pattern for FSCALE.
        (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE
        operands as well as scalar floating modes.

        * config/aarch64/iterators.md:
        (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well
        as HF, SF, and DF.
        (VPRED): Extended the attribute to handle GPF_HF modes.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/sve/fscale.c: New test.

0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch

[PATCH v2] aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

Reply via email to