https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:1753a7120109c1d3b682f9487d6cca64fb2f0929

commit r13-1038-g1753a7120109c1d3b682f9487d6cca64fb2f0929
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Fri Jun 10 15:14:23 2022 +0100

    PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.

    This patch addresses the issue in comment #6 of PR rtl-optimization/7061
    (a four digit PR number) from 2006 where on x86_64 complex number arguments
    are unconditionally spilled to the stack.

    For the test cases below:
    float re(float _Complex a) { return __real__ a; }
    float im(float _Complex a) { return __imag__ a; }

    GCC with -O2 currently generates:

    re:     movq    %xmm0, -8(%rsp)
            movss   -8(%rsp), %xmm0
            ret
    im:     movq    %xmm0, -8(%rsp)
            movss   -4(%rsp), %xmm0
            ret

    with this patch we now generate:

    re:     ret
    im:     movq    %xmm0, %rax
            shrq    $32, %rax
            movd    %eax, %xmm0
            ret

    [Technically, this shift can be performed on %xmm0 in a single
    instruction, but the backend needs to be taught to do that, the
    important bit is that the SCmode argument isn't written to the
    stack].

    The patch itself is to emit_group_store where just before RTL
    expansion commits to writing to the stack, we check if the store
    group consists of a single scalar integer register that holds
    a complex mode value; on x86_64 SCmode arguments are passed in
    DImode registers.  If this is the case, we can use a SUBREG to
    "view_convert" the integer to the equivalent complex mode.

    An interesting corner case that showed up during testing is that
    x86_64 also passes HCmode arguments in DImode registers(!), i.e.
    using modes of different sizes.  This is easily handled/supported
    by first converting to an integer mode of the correct size, and
    then generating a complex mode SUBREG of this.  This is similar
    in concept to the patch I proposed here:
    https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

    2020-06-10  Roger Sayle  <ro...@nextmovesoftware.com>

    gcc/ChangeLog
            PR rtl-optimization/7061
            * expr.cc (emit_group_store): For groups that consist of a single
            scalar integer register that hold a complex mode value, use
            gen_lowpart to generate a SUBREG to "view_convert" to the complex
            mode.  For modes of different sizes, first convert to an integer
            mode of the appropriate size.

    gcc/testsuite/ChangeLog
            PR rtl-optimization/7061
            * gcc.target/i386/pr7061-1.c: New test case.
            * gcc.target/i386/pr7061-2.c: New test case.

Reply via email to