On Tue, Oct 28, 2025 at 6:35 PM Andrew Pinski
<[email protected]> wrote:

> > > +(define_insn_and_split 
> > > "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
> > > +  [(set (match_operand:GPI 0 "register_operand" "=r")
> > > +     (UMAXMIN:GPI
> > > +       (plus:GPI (match_operand:GPI 1 "register_operand" "r")
> > > +                 (match_operand:GPI 2 "register_operand" "r"))
> > > +       (match_dup ovf_commutate)))
> > > +   (clobber (match_scratch:GPI 3))]
> >
> > I think you might want an "=r" constraint on the match_scratch here.
>
> In this case since the split is only allowed pre-RA I think not having
> "=r" is correct
> BUT I don't see anything blocking this pattern from matching post RA.
> So I think we want to add the "=r" and remove the !reload_completed
> check below just in case. I doubt it will match because it requires
> the combining of a few (more than 2) different instructions together
> which does not happen post RA.

This pattern is created during combine pass and can be split before
RA, because nothing depends on the type of allocated registers. Before
RA, pseudos are allowed, so you can freely allocate new pseudos using
gen_rtx_reg (). It is much easier to split the insn before RA, the
insn only needs to define the correct operand predicate, and there is
no need to define operand constraints.
>
> >
> > > +  "!TARGET_CSSC"
> > > +  "#"
> > > +  "&& !reload_completed"

Uh... You should use something like x86's ix86_pre_reload_split () in
the insn condition instead of !reload_completed in the split
condition. If something matches the condition post RA (maybe
late_combine2 pass), the compiler will ICE because it won't be able to
split the insn.

> > > +  [(parallel
> > > +      [(set (reg:CC_C CC_REGNUM)
> > > +         (compare:CC_C (plus:GPI (match_dup ovf_commutate)
> > > +                                 (match_dup <ovf_comm_opp>))
> > > +                       (match_dup ovf_commutate)))
> > > +       (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
> > > +                                 (match_dup <ovf_comm_opp>)))])
> > > +   (set (match_dup 0)
> > > +     (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
> > > +                                      (const_int 0))
> > > +                       (match_dup 3)
> > > +                       (match_dup ovf_commutate)))]
> > > +  {
> > > +    if (GET_CODE (operands[3]) == SCRATCH)
> > > +      operands[3] = gen_reg_rtx (<MODE>mode);
> >
> > At first I was a bit unsure about this idiom, since md.texi:9358 has
> > this to say about define_split (and therefore about
> > define_insn_and_split):
> >
> >     The @var{preparation-statements} are similar to those statements that
> >     are specified for @code{define_expand} (@pxref{Expander Definitions})
> >     and are executed before the new RTL is generated to prepare for the
> >     generated code or emit some insns whose pattern is not fixed.  Unlike
> >     those in @code{define_expand}, however, these statements must not
> >     generate any new pseudo-registers.  Once reload has completed, they also
> >     must not allocate any space in the stack frame.
> >
> > but I see that we already use this idiom for a few patterns in 
> > aarch64-sve.md.
> > Would appreciate a second opinion on this.  Either the existing SVE 
> > patterns are
> > wrong or (perhaps more likely) the md.texi docs are out of date.

This approach is quite strange. Before RA, you are always allowed to
allocate pseudo with gen_reg_rtx. So, just unconditionally use:
"operands[3] = gen_reg_rtx (<MODE>mode);"

Please find attached x86 version of the patch I am testing. It also
uses an existing non-overflow SUB RTX pattern, as Alex proposed.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 218377a1770..5125980e453 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -27353,6 +27353,69 @@ (define_peephole2
        (match_dup 0))]
   "peep2_reg_dead_p (2, operands[0])"
   [(set (match_dup 2) (match_dup 1))])
+
+;; umax (a, add (a, b)) => [sum, ovf] = add (a, b); ovf ? a : sum
+;; umin (a, add (a, b)) => [sum, ovf] = add (a, b); ovf ? sum : a
+
+(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
+
+(define_int_iterator ovf_comm [1 2])
+(define_int_attr ovf_comm_op [(1 "2") (2 "1")])
+
+(define_insn_and_split "*plus_within_<code><mode>3_<ovf_comm>"
+  [(set (match_operand:SWI248 0 "register_operand")
+       (umaxmin:SWI248
+         (plus:SWI248 (match_operand:SWI248 1 "nonimmediate_operand")
+                      (match_operand:SWI248 2 "nonimmediate_operand"))
+         (match_dup ovf_comm)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE
+   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands, TARGET_APX_NDD)
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+     [(set (reg:CCC FLAGS_REG)
+          (compare:CCC
+            (plus:SWI248 (match_dup ovf_comm) (match_dup <ovf_comm_op>))
+            (match_dup ovf_comm)))
+      (set (match_dup 3)
+          (plus:SWI248 (match_dup ovf_comm) (match_dup <ovf_comm_op>)))])
+   (set (match_dup 0)
+       (if_then_else:SWI248
+         (<ovf_add_cmp> (reg:CCC FLAGS_REG) (const_int 0))
+         (match_dup 3)
+         (match_dup ovf_comm)))]
+  "operands[3] = gen_reg_rtx (<MODE>mode);")
+
+;; umax (a, sub (a, b)) => [diff, udf] = sub (a, b); udf ? diff : a
+;; umin (a, sub (a, b)) => [diff, udf] = sub (a, b); udf ? a : diff
+
+(define_code_attr udf_sub_cmp [(umax "ltu") (umin "geu")])
+
+(define_insn_and_split "*minus_within_<code><mode>3"
+  [(set (match_operand:SWI248 0 "register_operand")
+       (umaxmin:SWI248
+         (minus:SWI248 (match_operand:SWI248 1 "nonimmediate_operand")
+                       (match_operand:SWI248 2 "nonimmediate_operand"))
+         (match_dup 1)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands, TARGET_APX_NDD)
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+     [(set (reg:CC FLAGS_REG)
+          (compare:CC (match_dup 1) (match_dup 2)))
+      (set (match_dup 3)
+          (minus:SWI248 (match_dup 1) (match_dup 2)))])
+   (set (match_dup 0)
+       (if_then_else:SWI248
+         (<udf_sub_cmp> (reg:CC FLAGS_REG) (const_int 0))
+         (match_dup 3)
+         (match_dup 1)))]
+  "operands[3] = gen_reg_rtx (<MODE>mode);")
 
 ;; Misc patterns (?)
 

Reply via email to