[Bug target/111733] Emit inline SVE FSCALE instruction for ldexp

cvs-commit at gcc dot gnu.org via Gcc-bugs Tue, 12 Nov 2024 20:57:41 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733


--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Soumya AR <soum...@gcc.gnu.org>:

https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076

commit r15-5188-g9b2915d95d855333d4d8f66b71a75f653ee0d076
Author: Soumya AR <soum...@nvidia.com>
Date:   Wed Nov 13 10:20:14 2024 +0530

    aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]

    This patch uses the FSCALE instruction provided by SVE to implement the
    standard ldexp family of functions.

    Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
    following code:

    float
    test_ldexpf (float x, int i)
    {
            return __builtin_ldexpf (x, i);
    }

    double
    test_ldexp (double x, int i)
    {
            return __builtin_ldexp(x, i);
    }

    GCC Output:

    test_ldexpf:
            b ldexpf

    test_ldexp:
            b ldexp

    Since SVE has support for an FSCALE instruction, we can use this to process
    scalar floats by moving them to a vector register and performing an fscale
call,
    similar to how LLVM tackles an ldexp builtin as well.

    New Output:

    test_ldexpf:
            fmov    s31, w0
            ptrue   p7.b, vl4
            fscale  z0.s, p7/m, z0.s, z31.s
            ret

    test_ldexp:
            sxtw    x0, w0
            ptrue   p7.b, vl8
            fmov    d31, x0
            fscale  z0.d, p7/m, z0.d, z31.d
            ret

    This is a revision of an earlier patch, and now uses the extended
definition of
    aarch64_ptrue_reg to generate predicate registers with the appropriate set
bits.

    The patch was bootstrapped and regtested on aarch64-linux-gnu, no
regression.
    OK for mainline?

    Signed-off-by: Soumya AR <soum...@nvidia.com>

    gcc/ChangeLog:

            PR target/111733
            * config/aarch64/aarch64-sve.md
            (ldexp<mode>3): Added a new pattern to match ldexp calls with
scalar
            floating modes and expand to the existing pattern for FSCALE.
            * config/aarch64/iterators.md:
            (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as
well
            as their scalar equivalents.
            (VPRED): Extended the attribute to handle GPF_HF modes.
            * internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/sve/fscale.c: New test.

[Bug target/111733] Emit inline SVE FSCALE instruction for ldexp

Reply via email to