Hi Richard,

Many thanks for the detailed review. I've attached an updated patch based on
your comments (bootstrapped and regtested on aarch64-linux).

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: 29 April 2020 17:16
> To: Alex Coplan <alex.cop...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <richard.earns...@arm.com>;
> Marcus Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov
> <kyrylo.tkac...@arm.com>; nd <n...@arm.com>
> Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend
> contexts
> 
> Yeah.  The thing that surprised me was that the non-extending form
> has the operator in the "then" arm of the if_then_else:
> 
> (define_insn "*csinv3<mode>_insn"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (if_then_else:GPI
>         (match_operand 1 "aarch64_comparison_operation" "")
>         (not:GPI (match_operand:GPI 2 "register_operand" "r"))
>         (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
>   ""
>   "csinv\\t%<w>0, %<w>3, %<w>2, %M1"
>   [(set_attr "type" "csel")]
> )
> 
> whereas the new pattern has it in the "else" arm.  I think for the
> non-extending form, having the operator in the "then" arm really is
> canonical and close to guaranteed, since that's how ifcvt processes
> half diamonds.
> 
> But when both values are zero-extended, ifcvt has to convert a full
> diamond, and I'm not sure we can rely on the two sides coming out in
> a particular order.  I think the two ?: orders above work with one
> pattern
> thanks to simplifications that happen before entering gimple.  If instead
> the operator is split out into a separate statement:
> 
> unsigned long long
> inv1(int a, unsigned b, unsigned c)
> {
>   unsigned d = ~c;
>   return a ? b : d;
> }
> 
> then we go back to using the less optimal CSEL sequence.
> 
> So I think it would be worth having a second pattern for the opposite
> order.

Agreed. It did seem a bit magical that we would get the other case for free. I
added this function to the tests and observed that we were getting the less
optimal CSEL sequence until I added the rule with the NEG_NOT on the other arm.

> > diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> > index c7c4d1dd519..2f7367c0b1a 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -4390,6 +4390,19 @@
> >    [(set_attr "type" "csel")]
> >  )
> >
> > +(define_insn "*csinv3_uxtw_insn"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (if_then_else:DI
> > +          (match_operand 1 "aarch64_comparison_operation" "")
> > +          (zero_extend:DI
> > +            (match_operand:SI 2 "aarch64_reg_or_zero" "rZ"))
> > +          (zero_extend:DI
> > +            (NEG_NOT:SI (match_operand:SI 3 "register_operand"
> "r")))))]
> > +  ""
> > +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> 
> The operand to a zero_extend can never be a const_int, so operand 2
> should just be a register_operand with an "r" constraint.

OK, makes sense. I've fixed this in the updated patch.

> At the risk of feature creep :-) a useful third pattern could be
> to combine a zero-extended operator result with an existing DImode value.
> In that case, the existing DImode value really can be "rZ" and should
> always be in the "else" arm of the if_then_else.  E.g.:
> 
> unsigned long long
> f(int a, unsigned long b, unsigned c)
> {
>   return a ? b : ~c;
> }
> 
> unsigned long long
> g(int a, unsigned c)
> {
>   return a ? 0 : ~c;
> }
> 
> But that's definitely something that can be left for later.

Hm. What sequence would you like to see for the function f here with the
argument already in DImode? I don't think we can do it with just a CMP + CSINV
like in the other test cases because you end up wanting to access b as x1 and c
as w2 which is not allowed.

Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the
64-bit registers:

f:
        mvn     w8, w2
        cmp     w0, #0
        csel    x0, x8, x1, eq
        ret

However, I agree that the restricted case where the else arm is (const_int 0) is
a worthwhile optimization (that wasn't caught by the previous patterns due to
const_ints never being zero_extended as you mentioned). I've added a pattern and
tests for this case in the updated patch.

Many thanks,
Alex

---

gcc/ChangeLog:

2020-04-30  Alex Coplan  <alex.cop...@arm.com>

        * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New.
          (*csinv3_uxtw_insn2): New.
          (*csinv3_uxtw_insn3): New.
        * config/aarch64/iterators.md (neg_not_cs): New.

gcc/testsuite/ChangeLog:

2020-04-30  Alex Coplan  <alex.cop...@arm.com>

        * gcc.target/aarch64/csinv-neg.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..37d651724ad 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4390,6 +4390,45 @@
   [(set_attr "type" "csel")]
 )
 
+(define_insn "*csinv3_uxtw_insn1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (match_operand:SI 2 "register_operand" "r"))
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn2"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+          (zero_extend:DI
+            (match_operand:SI 3 "register_operand" "r"))))]
+  ""
+  "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
+(define_insn "*csinv3_uxtw_insn3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (if_then_else:DI
+          (match_operand 1 "aarch64_comparison_operation" "")
+          (zero_extend:DI
+            (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r")))
+          (const_int 0)))]
+  ""
+  "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1"
+  [(set_attr "type" "csel")]
+)
+
+
 ;; If X can be loaded by a single CNT[BHWD] instruction,
 ;;
 ;;    A = UMAX (B, X)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 8e434389e59..a568cf21b99 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1932,6 +1932,9 @@
 ;; Operation names for negate and bitwise complement.
 (define_code_attr neg_not_op [(neg "neg") (not "not")])
 
+;; csinv, csneg insn suffixes.
+(define_code_attr neg_not_cs [(neg "neg") (not "inv")])
+
 ;; Similar, but when the second operand is inverted.
 (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c 
b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
new file mode 100644
index 00000000000..cc64b4094d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c
@@ -0,0 +1,104 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/*
+** inv1:
+**     cmp     w0, 0
+**     csinv   w0, w1, w2, ne
+**     ret
+*/
+unsigned long long
+inv1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : ~c;
+}
+
+/*
+** inv1_local:
+**     cmp     w0, 0
+**     csinv   w0, w1, w2, ne
+**     ret
+*/
+unsigned long long
+inv1_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~c;
+  return a ? b : d;
+}
+
+/*
+** inv_zero1:
+**     cmp     w0, 0
+**     csinv   w0, wzr, w1, ne
+**     ret
+*/
+unsigned long long
+inv_zero1(unsigned a, unsigned b)
+{
+  return a ? 0 : ~b;
+}
+
+/*
+** inv_zero2:
+**     cmp     w0, 0
+**     csinv   w0, wzr, w1, eq
+**     ret
+*/
+unsigned long long
+inv_zero2(unsigned a, unsigned b)
+{
+  return a ? ~b : 0;
+}
+
+
+/*
+** inv2:
+**     cmp     w0, 0
+**     csinv   w0, w2, w1, eq
+**     ret
+*/
+unsigned long long
+inv2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? ~b : c;
+}
+
+/*
+** inv2_local:
+**     cmp     w0, 0
+**     csinv   w0, w2, w1, eq
+**     ret
+*/
+unsigned long long
+inv2_local(unsigned a, unsigned b, unsigned c)
+{
+  unsigned d = ~b;
+  return a ? d : c;
+}
+
+/*
+** neg1:
+**     cmp     w0, 0
+**     csneg   w0, w1, w2, ne
+**     ret
+*/
+unsigned long long
+neg1(unsigned a, unsigned b, unsigned c)
+{
+  return a ? b : -c;
+}
+
+
+/*
+** neg2:
+**     cmp     w0, 0
+**     csneg   w0, w2, w1, eq
+**     ret
+*/
+unsigned long long
+neg2(unsigned a, unsigned b, unsigned c)
+{
+  return a ? -b : c;
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */

Reply via email to