https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116896

--- Comment #29 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:37554bacfd38b1466278b529d9e70a44d7b1b909

commit r15-4105-g37554bacfd38b1466278b529d9e70a44d7b1b909
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Mon Oct 7 10:50:39 2024 +0200

    ssa-math-opts, i386: Improve spaceship expansion [PR116896]

    The PR notes that we don't emit optimal code for C++ spaceship
    operator if the result is returned as an integer rather than the
    result just being compared against different values and different
    code executed based on that.
    So e.g. for
    template <typename T>
    auto foo (T x, T y) { return x <=> y; }
    for both floating point types, signed integer types and unsigned integer
    types.  auto in that case is std::strong_ordering or std::partial_ordering,
    which are fancy C++ abstractions around struct with signed char member
    which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
    ordering (but for -ffast-math 2 is never the case).
    I'm afraid functions like that are fairly common and unless they are
    inlined, we really need to map the comparison to those -1, 0, 1 or
    -1, 0, 1, 2 values.

    Now, for floating point spaceship I've in the past already added an
    optimization (with tree-ssa-math-opts.cc discovery and named optab, the
    optab only defined on x86 though right now), which ensures there is just
    a single comparison instruction and then just tests based on flags.
    Now, if we have code like:
      auto a = x <=> y;
      if (a == std::partial_ordering::less)
        bar ();
      else if (a == std::partial_ordering::greater)
        baz ();
      else if (a == std::partial_ordering::equivalent)
        qux ();
      else if (a == std::partial_ordering::unordered)
        corge ();
    etc., that results in decent code generation, the spaceship named pattern
    on x86 optimizes for the jumps, so emits comparisons on the flags, followed
    by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes
that
    well.  But if the result needs to be stored into an integer and just
    returned that way or there are no immediate jumps based on it (or turned
    into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
    doesn't do a good job for that, we end up with say
            comiss  %xmm1, %xmm0
            jp      .L4
            seta    %al
            movl    $0, %edx
            leal    -1(%rax,%rax), %eax
            cmove   %edx, %eax
            ret
    .L4:
            movl    $2, %eax
            ret
    The jp is good, that is the unlikely case and can't be easily handled in
    straight line code due to the layout of the flags, but the rest uses cmov
    which often isn't a win and a weird math.
    With the patch below we can get instead
            xorl    %eax, %eax
            comiss  %xmm1, %xmm0
            jp      .L2
            seta    %al
            sbbl    $0, %eax
            ret
    .L2:
            movl    $2, %eax
            ret

    The patch changes the discovery in the generic code, by detecting if
    the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
    -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
    a new argument to .SPACESHIP ifn, so that the named pattern is told whether
    it should optimize for branches or for loading the result into a -1, 0, 1
    (, 2) integer.  Additionally, it doesn't detect just floating point <=>
    anymore, but also integer and unsigned integer, but in those cases only
    if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
    result in good code).
    The backend then can for those integer or unsigned integer <=>s return
    effectively (x > y) - (x < y) in a way that is efficient on the target
    (so for x86 with ensuring zero initialization first when needed before
    setcc; one for floating point and unsigned, where there is just one setcc
    and the second one optimized into sbb instruction, two for the signed int
    case).  So e.g. for signed int we now emit
            xorl    %edx, %edx
            xorl    %eax, %eax
            cmpl    %esi, %edi
            setl    %dl
            setg    %al
            subl    %edx, %eax
            ret
    and for unsigned
            xorl    %eax, %eax
            cmpl    %esi, %edi
            seta    %al
            sbbb    $0, %al
            ret

    Note, I wonder if other targets wouldn't benefit from defining the
    named optab too...

    2024-10-07  Jakub Jelinek  <ja...@redhat.com>

            PR middle-end/116896
            * optabs.def (spaceship_optab): Use spaceship$a4 rather than
            spaceship$a3.
            * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
            rather than 2, expand the last one, expect 4 operands of
            spaceship_optab.
            * tree-ssa-math-opts.cc: Include cfghooks.h.
            (optimize_spaceship): Check if a single PHI is initialized to
            -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
            argument to .SPACESHIP and optimize away the comparisons,
            otherwise pass 0.  Also check for integer comparisons rather than
            floating point, in that case do it only if there is a single PHI
            with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
            if the <=> is signed, 2 if unsigned.
            * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add
            another rtx argument.
            (ix86_expand_int_spaceship): Declare.
            * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add
            arg3 argument, if it is const0_rtx, expand like before, otherwise
            emit optimized sequence for setting the result into a GPR.
            (ix86_expand_int_spaceship): New function.
            * config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code.
            (setcc_si_slp): New define_expand.
            (*setcc_si_slp): New define_insn_and_split.
            (setcc + setcc + movzbl): New define_peephole2.
            (spaceship<mode>3): Renamed to ...
            (spaceship<mode>4): ... this.  Add an extra operand, pass it
            to ix86_expand_fp_spaceship.
            (spaceshipxf3): Renamed to ...
            (spaceshipxf4): ... this.  Add an extra operand, pass it
            to ix86_expand_fp_spaceship.
            (spaceship<mode>4): New define_expand for SWI modes.
            * doc/md.texi (spaceship@var{m}3): Renamed to ...
            (spaceship@var{m}4): ... this.  Document the meaning of last
            operand.

            * g++.target/i386/pr116896-1.C: New test.
            * g++.target/i386/pr116896-2.C: New test.

Reply via email to