On Fri, 4 Oct 2024, Uros Bizjak wrote:

> On Fri, Oct 4, 2024 at 11:58 AM Jakub Jelinek <ja...@redhat.com> wrote:
> >
> > Hi!
> >
> > The PR notes that we don't emit optimal code for C++ spaceship
> > operator if the result is returned as an integer rather than the
> > result just being compared against different values and different
> > code executed based on that.
> > So e.g. for
> > template <typename T>
> > auto foo (T x, T y) { return x <=> y; }
> > for both floating point types, signed integer types and unsigned integer
> > types.  auto in that case is std::strong_ordering or std::partial_ordering,
> > which are fancy C++ abstractions around struct with signed char member
> > which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial
> > ordering (but for -ffast-math 2 is never the case).
> > I'm afraid functions like that are fairly common and unless they are
> > inlined, we really need to map the comparison to those -1, 0, 1 or
> > -1, 0, 1, 2 values.
> >
> > Now, for floating point spaceship I've in the past already added an
> > optimization (with tree-ssa-math-opts.cc discovery and named optab, the
> > optab only defined on x86 though right now), which ensures there is just
> > a single comparison instruction and then just tests based on flags.
> > Now, if we have code like:
> >   auto a = x <=> y;
> >   if (a == std::partial_ordering::less)
> >     bar ();
> >   else if (a == std::partial_ordering::greater)
> >     baz ();
> >   else if (a == std::partial_ordering::equivalent)
> >     qux ();
> >   else if (a == std::partial_ordering::unordered)
> >     corge ();
> > etc., that results in decent code generation, the spaceship named pattern
> > on x86 optimizes for the jumps, so emits comparisons on the flags, followed
> > by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that
> > well.  But if the result needs to be stored into an integer and just
> > returned that way or there are no immediate jumps based on it (or turned
> > into some non-standard integer values like -42, 0, 36, 75 etc.), then CE
> > doesn't do a good job for that, we end up with say
> >         comiss  %xmm1, %xmm0
> >         jp      .L4
> >         seta    %al
> >         movl    $0, %edx
> >         leal    -1(%rax,%rax), %eax
> >         cmove   %edx, %eax
> >         ret
> > .L4:
> >         movl    $2, %eax
> >         ret
> > The jp is good, that is the unlikely case and can't be easily handled in
> > straight line code due to the layout of the flags, but the rest uses cmov
> > which often isn't a win and a weird math.
> > With the patch below we can get instead
> >         xorl    %eax, %eax
> >         comiss  %xmm1, %xmm0
> >         jp      .L2
> >         seta    %al
> >         sbbl    $0, %eax
> >         ret
> > .L2:
> >         movl    $2, %eax
> >         ret
> >
> > The patch changes the discovery in the generic code, by detecting if
> > the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or
> > -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in
> > a new argument to .SPACESHIP ifn, so that the named pattern is told whether
> > it should optimize for branches or for loading the result into a -1, 0, 1
> > (, 2) integer.  Additionally, it doesn't detect just floating point <=>
> > anymore, but also integer and unsigned integer, but in those cases only
> > if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons
> > result in good code).
> > The backend then can for those integer or unsigned integer <=>s return
> > effectively (x > y) - (x < y) in a way that is efficient on the target
> > (so for x86 with ensuring zero initialization first when needed before
> > setcc; one for floating point and unsigned, where there is just one setcc
> > and the second one optimized into sbb instruction, two for the signed int
> > case).  So e.g. for signed int we now emit
> >         xorl    %edx, %edx
> >         xorl    %eax, %eax
> >         cmpl    %esi, %edi
> >         setl    %dl
> >         setg    %al
> >         subl    %edx, %eax
> >         ret
> > and for unsigned
> >         xorl    %eax, %eax
> >         cmpl    %esi, %edi
> >         seta    %al
> >         sbbb    $0, %al
> >         ret
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The forwprop added optmization looks like it would match PHI-opt better,
but I'm fine with leaving it in forwprop.  I do wonder whether instead
of adding a flag adding the actual values wanted as argument to
.SPACESHIP would allow further optimizations (maybe also indicating
cases "ignored")?  Are those -1, 0, 1, 2 standard mandated values
or implementation defined (or part of the platform ABI)?

That said, the patch is OK.

Thanks,
Richar.

> > Note, I wonder if other targets wouldn't benefit from defining the
> > named optab too...
> >
> > 2024-10-04  Jakub Jelinek  <ja...@redhat.com>
> >
> >         PR middle-end/116896
> >         * optabs.def (spaceship_optab): Use spaceship$a4 rather than
> >         spaceship$a3.
> >         * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments
> >         rather than 2, expand the last one, expect 4 operands of
> >         spaceship_optab.
> >         * tree-ssa-math-opts.cc: Include cfghooks.h.
> >         (optimize_spaceship): Check if a single PHI is initialized to
> >         -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new)
> >         argument to .SPACESHIP and optimize away the comparisons,
> >         otherwise pass 0.  Also check for integer comparisons rather than
> >         floating point, in that case do it only if there is a single PHI
> >         with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP
> >         if the <=> is signed, 2 if unsigned.
> >         * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add
> >         another rtx argument.
> >         (ix86_expand_int_spaceship): Declare.
> >         * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add
> >         arg3 argument, if it is const0_rtx, expand like before, otherwise
> >         emit optimized sequence for setting the result into a GPR.
> >         (ix86_expand_int_spaceship): New function.
> >         * config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code.
> >         (setcc_si_slp): New define_expand.
> >         (*setcc_si_slp): New define_insn_and_split.
> >         (setcc + setcc + movzbl): New define_peephole2.
> >         (spaceship<mode>3): Renamed to ...
> >         (spaceship<mode>4): ... this.  Add an extra operand, pass it
> >         to ix86_expand_fp_spaceship.
> >         (spaceshipxf3): Renamed to ...
> >         (spaceshipxf4): ... this.  Add an extra operand, pass it
> >         to ix86_expand_fp_spaceship.
> >         (spaceship<mode>4): New define_expand for SWI modes.
> >         * doc/md.texi (spaceship@var{m}3): Renamed to ...
> >         (spaceship@var{m}4): ... this.  Document the meaning of last
> >         operand.
> >
> >         * g++.target/i386/pr116896-1.C: New test.
> >         * g++.target/i386/pr116896-2.C: New test.
> 
> LGTM for the x86 part.
> 
> Thanks,
> Uros.
> 
> >
> > --- gcc/optabs.def.jj   2024-10-01 09:38:58.143960049 +0200
> > +++ gcc/optabs.def      2024-10-02 13:52:53.352550358 +0200
> > @@ -308,7 +308,7 @@ OPTAB_D (negv3_optab, "negv$I$a3")
> >  OPTAB_D (uaddc5_optab, "uaddc$I$a5")
> >  OPTAB_D (usubc5_optab, "usubc$I$a5")
> >  OPTAB_D (addptr3_optab, "addptr$a3")
> > -OPTAB_D (spaceship_optab, "spaceship$a3")
> > +OPTAB_D (spaceship_optab, "spaceship$a4")
> >
> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> > --- gcc/internal-fn.cc.jj       2024-10-01 09:38:41.482192804 +0200
> > +++ gcc/internal-fn.cc  2024-10-02 13:52:53.354550330 +0200
> > @@ -5107,6 +5107,7 @@ expand_SPACESHIP (internal_fn, gcall *st
> >    tree lhs = gimple_call_lhs (stmt);
> >    tree rhs1 = gimple_call_arg (stmt, 0);
> >    tree rhs2 = gimple_call_arg (stmt, 1);
> > +  tree rhs3 = gimple_call_arg (stmt, 2);
> >    tree type = TREE_TYPE (rhs1);
> >
> >    do_pending_stack_adjust ();
> > @@ -5114,13 +5115,15 @@ expand_SPACESHIP (internal_fn, gcall *st
> >    rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >    rtx op1 = expand_normal (rhs1);
> >    rtx op2 = expand_normal (rhs2);
> > +  rtx op3 = expand_normal (rhs3);
> >
> > -  class expand_operand ops[3];
> > +  class expand_operand ops[4];
> >    create_call_lhs_operand (&ops[0], target, TYPE_MODE (TREE_TYPE (lhs)));
> >    create_input_operand (&ops[1], op1, TYPE_MODE (type));
> >    create_input_operand (&ops[2], op2, TYPE_MODE (type));
> > +  create_input_operand (&ops[3], op3, TYPE_MODE (TREE_TYPE (rhs3)));
> >    insn_code icode = optab_handler (spaceship_optab, TYPE_MODE (type));
> > -  expand_insn (icode, 3, ops);
> > +  expand_insn (icode, 4, ops);
> >    assign_call_lhs (lhs, target, &ops[0]);
> >  }
> >
> > --- gcc/tree-ssa-math-opts.cc.jj        2024-10-01 09:38:58.331957422 +0200
> > +++ gcc/tree-ssa-math-opts.cc   2024-10-03 15:00:15.893473504 +0200
> > @@ -117,6 +117,7 @@ along with GCC; see the file COPYING3.
> >  #include "domwalk.h"
> >  #include "tree-ssa-math-opts.h"
> >  #include "dbgcnt.h"
> > +#include "cfghooks.h"
> >
> >  /* This structure represents one basic block that either computes a
> >     division, or is a common dominator for basic block that compute a
> > @@ -5869,7 +5870,7 @@ convert_mult_to_highpart (gassign *stmt,
> >     <bb 6> [local count: 1073741824]:
> >     and turn it into:
> >     <bb 2> [local count: 1073741824]:
> > -   _1 = .SPACESHIP (a_2(D), b_3(D));
> > +   _1 = .SPACESHIP (a_2(D), b_3(D), 0);
> >     if (_1 == 0)
> >       goto <bb 6>; [34.00%]
> >     else
> > @@ -5891,7 +5892,13 @@ convert_mult_to_highpart (gassign *stmt,
> >
> >     <bb 6> [local count: 1073741824]:
> >     so that the backend can emit optimal comparison and
> > -   conditional jump sequence.  */
> > +   conditional jump sequence.  If the
> > +   <bb 6> [local count: 1073741824]:
> > +   above has a single PHI like:
> > +   # _27 = PHI<0(2), -1(3), 2(4), 1(5)>
> > +   then replace it with effectively
> > +   _1 = .SPACESHIP (a_2(D), b_3(D), 1);
> > +   _27 = _1;  */
> >
> >  static void
> >  optimize_spaceship (gcond *stmt)
> > @@ -5901,7 +5908,8 @@ optimize_spaceship (gcond *stmt)
> >      return;
> >    tree arg1 = gimple_cond_lhs (stmt);
> >    tree arg2 = gimple_cond_rhs (stmt);
> > -  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1))
> > +  if ((!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1))
> > +       && !INTEGRAL_TYPE_P (TREE_TYPE (arg1)))
> >        || optab_handler (spaceship_optab,
> >                         TYPE_MODE (TREE_TYPE (arg1))) == CODE_FOR_nothing
> >        || operand_equal_p (arg1, arg2, 0))
> > @@ -6013,12 +6021,105 @@ optimize_spaceship (gcond *stmt)
> >         }
> >      }
> >
> > -  gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 2, arg1, arg2);
> > +  /* Check if there is a single bb into which all failed conditions
> > +     jump to (perhaps through an empty block) and if it results in
> > +     a single integral PHI which just sets it to -1, 0, 1, 2
> > +     (or -1, 0, 1 when NaNs can't happen).  In that case use 1 rather
> > +     than 0 as last .SPACESHIP argument to tell backends it might
> > +     consider different code generation and just cast the result
> > +     of .SPACESHIP to the PHI result.  */
> > +  tree arg3 = integer_zero_node;
> > +  edge e = EDGE_SUCC (bb0, 0);
> > +  if (e->dest == bb1)
> > +    e = EDGE_SUCC (bb0, 1);
> > +  basic_block bbp = e->dest;
> > +  gphi *phi = NULL;
> > +  for (gphi_iterator psi = gsi_start_phis (bbp);
> > +       !gsi_end_p (psi); gsi_next (&psi))
> > +    {
> > +      gphi *gp = psi.phi ();
> > +      tree res = gimple_phi_result (gp);
> > +
> > +      if (phi != NULL
> > +         || virtual_operand_p (res)
> > +         || !INTEGRAL_TYPE_P (TREE_TYPE (res))
> > +         || TYPE_PRECISION (TREE_TYPE (res)) < 2)
> > +       {
> > +         phi = NULL;
> > +         break;
> > +       }
> > +      phi = gp;
> > +    }
> > +  if (phi
> > +      && integer_zerop (gimple_phi_arg_def_from_edge (phi, e))
> > +      && EDGE_COUNT (bbp->preds) == (HONOR_NANS (TREE_TYPE (arg1)) ? 4 : 
> > 3))
> > +    {
> > +      for (unsigned i = 0; phi && i < EDGE_COUNT (bbp->preds) - 1; ++i)
> > +       {
> > +         edge e3 = i == 0 ? e1 : i == 1 ? em1 : e2;
> > +         if (e3->dest != bbp)
> > +           {
> > +             if (!empty_block_p (e3->dest)
> > +                 || !single_succ_p (e3->dest)
> > +                 || single_succ (e3->dest) != bbp)
> > +               {
> > +                 phi = NULL;
> > +                 break;
> > +               }
> > +             e3 = single_succ_edge (e3->dest);
> > +           }
> > +         tree a = gimple_phi_arg_def_from_edge (phi, e3);
> > +         if (TREE_CODE (a) != INTEGER_CST
> > +             || (i == 0 && !integer_onep (a))
> > +             || (i == 1 && !integer_all_onesp (a))
> > +             || (i == 2 && wi::to_widest (a) != 2))
> > +           {
> > +             phi = NULL;
> > +             break;
> > +           }
> > +       }
> > +      if (phi)
> > +       arg3 = build_int_cst (integer_type_node,
> > +                             TYPE_UNSIGNED (TREE_TYPE (arg1)) ? 2 : 1);
> > +    }
> > +
> > +  /* For integral <=> comparisons only use .SPACESHIP if it is turned
> > +     into an integer (-1, 0, 1).  */
> > +  if (!SCALAR_FLOAT_TYPE_P (TREE_TYPE (arg1)) && arg3 == integer_zero_node)
> > +    return;
> > +
> > +  gcall *gc = gimple_build_call_internal (IFN_SPACESHIP, 3, arg1, arg2, 
> > arg3);
> >    tree lhs = make_ssa_name (integer_type_node);
> >    gimple_call_set_lhs (gc, lhs);
> >    gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> >    gsi_insert_before (&gsi, gc, GSI_SAME_STMT);
> >
> > +  wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node));
> > +  wide_int w2 = (HONOR_NANS (TREE_TYPE (arg1))
> > +                ? wi::two (TYPE_PRECISION (integer_type_node))
> > +                : wi::one (TYPE_PRECISION (integer_type_node)));
> > +  int_range<1> vr (TREE_TYPE (lhs), wm1, w2);
> > +  set_range_info (lhs, vr);
> > +
> > +  if (arg3 != integer_zero_node)
> > +    {
> > +      tree type = TREE_TYPE (gimple_phi_result (phi));
> > +      if (!useless_type_conversion_p (type, integer_type_node))
> > +       {
> > +         tree tem = make_ssa_name (type);
> > +         gimple *gcv = gimple_build_assign (tem, NOP_EXPR, lhs);
> > +         gsi_insert_before (&gsi, gcv, GSI_SAME_STMT);
> > +         lhs = tem;
> > +       }
> > +      SET_PHI_ARG_DEF_ON_EDGE (phi, e, lhs);
> > +      gimple_cond_set_lhs (stmt, boolean_false_node);
> > +      gimple_cond_set_rhs (stmt, boolean_false_node);
> > +      gimple_cond_set_code (stmt, (e->flags & EDGE_TRUE_VALUE)
> > +                                 ? EQ_EXPR : NE_EXPR);
> > +      update_stmt (stmt);
> > +      return;
> > +    }
> > +
> >    gimple_cond_set_lhs (stmt, lhs);
> >    gimple_cond_set_rhs (stmt, integer_zero_node);
> >    update_stmt (stmt);
> > @@ -6055,11 +6156,6 @@ optimize_spaceship (gcond *stmt)
> >                             (e2->flags & EDGE_TRUE_VALUE) ? NE_EXPR : 
> > EQ_EXPR);
> >        update_stmt (cond);
> >      }
> > -
> > -  wide_int wm1 = wi::minus_one (TYPE_PRECISION (integer_type_node));
> > -  wide_int w2 = wi::two (TYPE_PRECISION (integer_type_node));
> > -  int_range<1> vr (TREE_TYPE (lhs), wm1, w2);
> > -  set_range_info (lhs, vr);
> >  }
> >
> >
> > --- gcc/config/i386/i386-protos.h.jj    2024-10-01 09:38:41.237196225 +0200
> > +++ gcc/config/i386/i386-protos.h       2024-10-02 18:45:12.807284696 +0200
> > @@ -164,7 +164,8 @@ extern bool ix86_expand_fp_vec_cmp (rtx[
> >  extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx);
> >  extern void ix86_expand_sse_extend (rtx, rtx, bool);
> >  extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
> > -extern void ix86_expand_fp_spaceship (rtx, rtx, rtx);
> > +extern void ix86_expand_fp_spaceship (rtx, rtx, rtx, rtx);
> > +extern void ix86_expand_int_spaceship (rtx, rtx, rtx, rtx);
> >  extern bool ix86_expand_int_addcc (rtx[]);
> >  extern void ix86_expand_carry (rtx arg);
> >  extern rtx_insn *ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
> > --- gcc/config/i386/i386-expand.cc.jj   2024-10-01 09:38:41.181197008 +0200
> > +++ gcc/config/i386/i386-expand.cc      2024-10-04 11:05:09.647022644 +0200
> > @@ -3144,12 +3144,15 @@ ix86_expand_setcc (rtx dest, enum rtx_co
> >     dest = op0 == op1 ? 0 : op0 < op1 ? -1 : op0 > op1 ? 1 : 2.  */
> >
> >  void
> > -ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1)
> > +ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1, rtx op2)
> >  {
> >    gcc_checking_assert (ix86_fp_comparison_strategy (GT) != 
> > IX86_FPCMP_ARITH);
> > +  rtx zero = NULL_RTX;
> > +  if (op2 != const0_rtx && TARGET_IEEE_FP && GET_MODE (dest) == SImode)
> > +    zero = force_reg (SImode, const0_rtx);
> >    rtx gt = ix86_expand_fp_compare (GT, op0, op1);
> > -  rtx l0 = gen_label_rtx ();
> > -  rtx l1 = gen_label_rtx ();
> > +  rtx l0 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX;
> > +  rtx l1 = op2 == const0_rtx ? gen_label_rtx () : NULL_RTX;
> >    rtx l2 = TARGET_IEEE_FP ? gen_label_rtx () : NULL_RTX;
> >    rtx lend = gen_label_rtx ();
> >    rtx tmp;
> > @@ -3163,23 +3166,68 @@ ix86_expand_fp_spaceship (rtx dest, rtx
> >        jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> >        add_reg_br_prob_note (jmp, profile_probability:: very_unlikely ());
> >      }
> > -  rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode,
> > -                          gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx);
> > -  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq,
> > -                             gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx);
> > -  jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> > -  add_reg_br_prob_note (jmp, profile_probability::unlikely ());
> > -  tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt,
> > -                             gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx);
> > -  jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> > -  add_reg_br_prob_note (jmp, profile_probability::even ());
> > -  emit_move_insn (dest, constm1_rtx);
> > -  emit_jump (lend);
> > -  emit_label (l0);
> > -  emit_move_insn (dest, const0_rtx);
> > -  emit_jump (lend);
> > -  emit_label (l1);
> > -  emit_move_insn (dest, const1_rtx);
> > +  if (op2 == const0_rtx)
> > +    {
> > +      rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode,
> > +                              gen_rtx_REG (CCFPmode, FLAGS_REG), 
> > const0_rtx);
> > +      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq,
> > +                                 gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx);
> > +      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> > +      add_reg_br_prob_note (jmp, profile_probability::unlikely ());
> > +      tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt,
> > +                                 gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx);
> > +      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
> > +      add_reg_br_prob_note (jmp, profile_probability::even ());
> > +      emit_move_insn (dest, constm1_rtx);
> > +      emit_jump (lend);
> > +      emit_label (l0);
> > +      emit_move_insn (dest, const0_rtx);
> > +      emit_jump (lend);
> > +      emit_label (l1);
> > +      emit_move_insn (dest, const1_rtx);
> > +    }
> > +  else
> > +    {
> > +      rtx lt_tmp = gen_reg_rtx (QImode);
> > +      ix86_expand_setcc (lt_tmp, UNLT, gen_rtx_REG (CCFPmode, FLAGS_REG),
> > +                        const0_rtx);
> > +      if (GET_MODE (dest) != QImode)
> > +       {
> > +         tmp = gen_reg_rtx (GET_MODE (dest));
> > +         emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> > +                                                           lt_tmp)));
> > +         lt_tmp = tmp;
> > +       }
> > +      rtx gt_tmp;
> > +      if (zero)
> > +       {
> > +         /* If TARGET_IEEE_FP and dest has SImode, emit SImode clear
> > +            before the floating point comparison and use setcc_si_slp
> > +            pattern to hide it from the combiner, so that it doesn't
> > +            undo it.  */
> > +         tmp = ix86_expand_compare (GT, XEXP (gt, 0), const0_rtx);
> > +         PUT_MODE (tmp, QImode);
> > +         emit_insn (gen_setcc_si_slp (zero, tmp, zero));
> > +         gt_tmp = zero;
> > +       }
> > +      else
> > +       {
> > +         gt_tmp = gen_reg_rtx (QImode);
> > +         ix86_expand_setcc (gt_tmp, GT, XEXP (gt, 0), const0_rtx);
> > +         if (GET_MODE (dest) != QImode)
> > +           {
> > +             tmp = gen_reg_rtx (GET_MODE (dest));
> > +             emit_insn (gen_rtx_SET (tmp,
> > +                                     gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> > +                                                          gt_tmp)));
> > +             gt_tmp = tmp;
> > +           }
> > +       }
> > +      tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, 
> > dest,
> > +                                0, OPTAB_DIRECT);
> > +      if (!rtx_equal_p (tmp, dest))
> > +       emit_move_insn (dest, tmp);
> > +    }
> >    emit_jump (lend);
> >    if (l2)
> >      {
> > @@ -3189,6 +3237,46 @@ ix86_expand_fp_spaceship (rtx dest, rtx
> >    emit_label (lend);
> >  }
> >
> > +/* Expand integral op0 <=> op1, i.e.
> > +   dest = op0 == op1 ? 0 : op0 < op1 ? -1 : 1.  */
> > +
> > +void
> > +ix86_expand_int_spaceship (rtx dest, rtx op0, rtx op1, rtx op2)
> > +{
> > +  gcc_assert (INTVAL (op2));
> > +  /* Not using ix86_expand_int_compare here, so that it doesn't swap
> > +     operands nor optimize CC mode - we need a mode usable for both
> > +     LT and GT resp. LTU and GTU comparisons with the same unswapped
> > +     operands.  */
> > +  rtx flags = gen_rtx_REG (INTVAL (op2) == 1 ? CCGCmode : CCmode, 
> > FLAGS_REG);
> > +  rtx tmp = gen_rtx_COMPARE (GET_MODE (flags), op0, op1);
> > +  emit_insn (gen_rtx_SET (flags, tmp));
> > +  rtx lt_tmp = gen_reg_rtx (QImode);
> > +  ix86_expand_setcc (lt_tmp, INTVAL (op2) == 1 ? LT : LTU, flags,
> > +                    const0_rtx);
> > +  if (GET_MODE (dest) != QImode)
> > +    {
> > +      tmp = gen_reg_rtx (GET_MODE (dest));
> > +      emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> > +                                                       lt_tmp)));
> > +      lt_tmp = tmp;
> > +    }
> > +  rtx gt_tmp = gen_reg_rtx (QImode);
> > +  ix86_expand_setcc (gt_tmp, INTVAL (op2) == 1 ? GT : GTU, flags,
> > +                    const0_rtx);
> > +  if (GET_MODE (dest) != QImode)
> > +    {
> > +      tmp = gen_reg_rtx (GET_MODE (dest));
> > +      emit_insn (gen_rtx_SET (tmp, gen_rtx_ZERO_EXTEND (GET_MODE (dest),
> > +                                                       gt_tmp)));
> > +      gt_tmp = tmp;
> > +    }
> > +  tmp = expand_simple_binop (GET_MODE (dest), MINUS, gt_tmp, lt_tmp, dest,
> > +                            0, OPTAB_DIRECT);
> > +  if (!rtx_equal_p (tmp, dest))
> > +    emit_move_insn (dest, tmp);
> > +}
> > +
> >  /* Expand comparison setting or clearing carry flag.  Return true when
> >     successful and set pop for the operation.  */
> >  static bool
> > --- gcc/config/i386/i386.md.jj  2024-10-01 09:38:41.296195402 +0200
> > +++ gcc/config/i386/i386.md     2024-10-03 14:32:24.661446577 +0200
> > @@ -118,6 +118,7 @@ (define_c_enum "unspec" [
> >    UNSPEC_PUSHFL
> >    UNSPEC_POPFL
> >    UNSPEC_OPTCOMX
> > +  UNSPEC_SETCC_SI_SLP
> >
> >    ;; For SSE/MMX support:
> >    UNSPEC_FIX_NOTRUNC
> > @@ -19281,6 +19282,27 @@ (define_insn "*setcc_qi_slp"
> >    [(set_attr "type" "setcc")
> >     (set_attr "mode" "QI")])
> >
> > +(define_expand "setcc_si_slp"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +       (unspec:SI
> > +         [(match_operand:QI 1)
> > +          (match_operand:SI 2 "register_operand")] UNSPEC_SETCC_SI_SLP))])
> > +
> > +(define_insn_and_split "*setcc_si_slp"
> > +  [(set (match_operand:SI 0 "register_operand" "=q")
> > +       (unspec:SI
> > +         [(match_operator:QI 1 "ix86_comparison_operator"
> > +            [(reg FLAGS_REG) (const_int 0)])
> > +          (match_operand:SI 2 "register_operand" "0")] 
> > UNSPEC_SETCC_SI_SLP))]
> > +  "ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0) (match_dup 2))
> > +   (set (strict_low_part (match_dup 3)) (match_dup 1))]
> > +{
> > +  operands[3] = gen_lowpart (QImode, operands[0]);
> > +})
> > +
> >  ;; In general it is not safe to assume too much about CCmode registers,
> >  ;; so simplify-rtx stops when it sees a second one.  Under certain
> >  ;; conditions this is safe on x86, so help combine not create
> > @@ -19776,6 +19798,32 @@ (define_peephole2
> >    operands[8] = gen_lowpart (QImode, operands[4]);
> >    ix86_expand_clear (operands[4]);
> >  })
> > +
> > +(define_peephole2
> > +  [(set (match_operand 4 "flags_reg_operand") (match_operand 0))
> > +   (set (strict_low_part (match_operand:QI 5 "register_operand"))
> > +       (match_operator:QI 6 "ix86_comparison_operator"
> > +         [(reg FLAGS_REG) (const_int 0)]))
> > +   (set (match_operand:QI 1 "register_operand")
> > +       (match_operator:QI 2 "ix86_comparison_operator"
> > +         [(reg FLAGS_REG) (const_int 0)]))
> > +   (set (match_operand 3 "any_QIreg_operand")
> > +       (zero_extend (match_dup 1)))]
> > +  "(peep2_reg_dead_p (4, operands[1])
> > +    || operands_match_p (operands[1], operands[3]))
> > +   && ! reg_overlap_mentioned_p (operands[3], operands[0])
> > +   && ! reg_overlap_mentioned_p (operands[3], operands[5])
> > +   && ! reg_overlap_mentioned_p (operands[1], operands[5])
> > +   && peep2_regno_dead_p (0, FLAGS_REG)"
> > +  [(set (match_dup 4) (match_dup 0))
> > +   (set (strict_low_part (match_dup 5))
> > +       (match_dup 6))
> > +   (set (strict_low_part (match_dup 7))
> > +       (match_dup 2))]
> > +{
> > +  operands[7] = gen_lowpart (QImode, operands[3]);
> > +  ix86_expand_clear (operands[3]);
> > +})
> >
> >  ;; Call instructions.
> >
> > @@ -29494,24 +29542,40 @@ (define_insn "hreset"
> >     (set_attr "length" "4")])
> >
> >  ;; Spaceship optimization
> > -(define_expand "spaceship<mode>3"
> > +(define_expand "spaceship<mode>4"
> >    [(match_operand:SI 0 "register_operand")
> >     (match_operand:MODEF 1 "cmp_fp_expander_operand")
> > -   (match_operand:MODEF 2 "cmp_fp_expander_operand")]
> > +   (match_operand:MODEF 2 "cmp_fp_expander_operand")
> > +   (match_operand:SI 3 "const_int_operand")]
> >    "(TARGET_80387 || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH))
> >     && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))"
> >  {
> > -  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]);
> > +  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2],
> > +                           operands[3]);
> >    DONE;
> >  })
> >
> > -(define_expand "spaceshipxf3"
> > +(define_expand "spaceshipxf4"
> >    [(match_operand:SI 0 "register_operand")
> >     (match_operand:XF 1 "nonmemory_operand")
> > -   (match_operand:XF 2 "nonmemory_operand")]
> > +   (match_operand:XF 2 "nonmemory_operand")
> > +   (match_operand:SI 3 "const_int_operand")]
> >    "TARGET_80387 && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))"
> >  {
> > -  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]);
> > +  ix86_expand_fp_spaceship (operands[0], operands[1], operands[2],
> > +                           operands[3]);
> > +  DONE;
> > +})
> > +
> > +(define_expand "spaceship<mode>4"
> > +  [(match_operand:SI 0 "register_operand")
> > +   (match_operand:SWI 1 "nonimmediate_operand")
> > +   (match_operand:SWI 2 "<general_operand>")
> > +   (match_operand:SI 3 "const_int_operand")]
> > +  ""
> > +{
> > +  ix86_expand_int_spaceship (operands[0], operands[1], operands[2],
> > +                            operands[3]);
> >    DONE;
> >  })
> >
> > --- gcc/doc/md.texi.jj  2024-10-01 09:38:58.035961557 +0200
> > +++ gcc/doc/md.texi     2024-10-02 20:16:22.502329039 +0200
> > @@ -8568,11 +8568,15 @@ inclusive and operand 1 exclusive.
> >  If this pattern is not defined, a call to the library function
> >  @code{__clear_cache} is used.
> >
> > -@cindex @code{spaceship@var{m}3} instruction pattern
> > -@item @samp{spaceship@var{m}3}
> > +@cindex @code{spaceship@var{m}4} instruction pattern
> > +@item @samp{spaceship@var{m}4}
> >  Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2
> >  if operand 1 with mode @var{m} compares less than operand 2, equal to
> >  operand 2, greater than operand 2 or is unordered with operand 2.
> > +Operand 3 should be @code{const0_rtx} if the result is used in comparisons,
> > +@code{const1_rtx} if the result is used as integer value and the comparison
> > +is signed, @code{const2_rtx} if the result is used as integer value and
> > +the comparison is unsigned.
> >  @var{m} should be a scalar floating point mode.
> >
> >  This pattern is not allowed to @code{FAIL}.
> > --- gcc/testsuite/g++.target/i386/pr116896-1.C.jj       2024-10-03 
> > 14:40:27.071813336 +0200
> > +++ gcc/testsuite/g++.target/i386/pr116896-1.C  2024-10-03 
> > 15:52:06.243819660 +0200
> > @@ -0,0 +1,35 @@
> > +// PR middle-end/116896
> > +// { dg-do compile { target c++20 } }
> > +// { dg-options "-O2 -masm=att -fno-stack-protector" }
> > +// { dg-final { scan-assembler-times "\tjp\t" 1 } }
> > +// { dg-final { scan-assembler-not "\tj\[^mp\]\[a-z\]*\t" } }
> > +// { dg-final { scan-assembler-times "\tsbb\[bl\]\t\\\$0, " 3 } }
> > +// { dg-final { scan-assembler-times "\tseta\t" 3 } }
> > +// { dg-final { scan-assembler-times "\tsetg\t" 1 } }
> > +// { dg-final { scan-assembler-times "\tsetl\t" 1 } }
> > +
> > +#include <compare>
> > +
> > +[[gnu::noipa]] auto
> > +foo (float x, float y)
> > +{
> > +  return x <=> y;
> > +}
> > +
> > +[[gnu::noipa, gnu::optimize ("fast-math")]] auto
> > +bar (float x, float y)
> > +{
> > +  return x <=> y;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +baz (int x, int y)
> > +{
> > +  return x <=> y;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +qux (unsigned x, unsigned y)
> > +{
> > +  return x <=> y;
> > +}
> > --- gcc/testsuite/g++.target/i386/pr116896-2.C.jj       2024-10-03 
> > 14:40:37.203674018 +0200
> > +++ gcc/testsuite/g++.target/i386/pr116896-2.C  2024-10-04 
> > 10:55:07.468396073 +0200
> > @@ -0,0 +1,41 @@
> > +// PR middle-end/116896
> > +// { dg-do run { target c++20 } }
> > +// { dg-options "-O2" }
> > +
> > +#include "pr116896-1.C"
> > +
> > +[[gnu::noipa]] auto
> > +corge (int x)
> > +{
> > +  return x <=> 0;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +garply (unsigned x)
> > +{
> > +  return x <=> 0;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (foo (-1.0f, 1.0f) != std::partial_ordering::less
> > +      || foo (1.0f, -1.0f) != std::partial_ordering::greater
> > +      || foo (1.0f, 1.0f) != std::partial_ordering::equivalent
> > +      || foo (__builtin_nanf (""), 1.0f) != 
> > std::partial_ordering::unordered
> > +      || bar (-2.0f, 2.0f) != std::partial_ordering::less
> > +      || bar (2.0f, -2.0f) != std::partial_ordering::greater
> > +      || bar (-5.0f, -5.0f) != std::partial_ordering::equivalent
> > +      || baz (-42, 42) != std::strong_ordering::less
> > +      || baz (42, -42) != std::strong_ordering::greater
> > +      || baz (42, 42) != std::strong_ordering::equal
> > +      || qux (40, 42) != std::strong_ordering::less
> > +      || qux (42, 40) != std::strong_ordering::greater
> > +      || qux (40, 40) != std::strong_ordering::equal
> > +      || corge (-15) != std::strong_ordering::less
> > +      || corge (15) != std::strong_ordering::greater
> > +      || corge (0) != std::strong_ordering::equal
> > +      || garply (15) != std::strong_ordering::greater
> > +      || garply (0) != std::strong_ordering::equal)
> > +    __builtin_abort ();
> > +}
> >
> >         Jakub
> >
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to