Hi Richard, Many thanks for the detailed review. I've attached an updated patch based on your comments (bootstrapped and regtested on aarch64-linux).
> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: 29 April 2020 17:16 > To: Alex Coplan <alex.cop...@arm.com> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw <richard.earns...@arm.com>; > Marcus Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov > <kyrylo.tkac...@arm.com>; nd <n...@arm.com> > Subject: Re: [PATCH] aarch64: prefer using csinv, csneg in zero extend > contexts > > Yeah. The thing that surprised me was that the non-extending form > has the operator in the "then" arm of the if_then_else: > > (define_insn "*csinv3<mode>_insn" > [(set (match_operand:GPI 0 "register_operand" "=r") > (if_then_else:GPI > (match_operand 1 "aarch64_comparison_operation" "") > (not:GPI (match_operand:GPI 2 "register_operand" "r")) > (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] > "" > "csinv\\t%<w>0, %<w>3, %<w>2, %M1" > [(set_attr "type" "csel")] > ) > > whereas the new pattern has it in the "else" arm. I think for the > non-extending form, having the operator in the "then" arm really is > canonical and close to guaranteed, since that's how ifcvt processes > half diamonds. > > But when both values are zero-extended, ifcvt has to convert a full > diamond, and I'm not sure we can rely on the two sides coming out in > a particular order. I think the two ?: orders above work with one > pattern > thanks to simplifications that happen before entering gimple. If instead > the operator is split out into a separate statement: > > unsigned long long > inv1(int a, unsigned b, unsigned c) > { > unsigned d = ~c; > return a ? b : d; > } > > then we go back to using the less optimal CSEL sequence. > > So I think it would be worth having a second pattern for the opposite > order. Agreed. It did seem a bit magical that we would get the other case for free. I added this function to the tests and observed that we were getting the less optimal CSEL sequence until I added the rule with the NEG_NOT on the other arm. > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index c7c4d1dd519..2f7367c0b1a 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -4390,6 +4390,19 @@ > > [(set_attr "type" "csel")] > > ) > > > > +(define_insn "*csinv3_uxtw_insn" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (if_then_else:DI > > + (match_operand 1 "aarch64_comparison_operation" "") > > + (zero_extend:DI > > + (match_operand:SI 2 "aarch64_reg_or_zero" "rZ")) > > + (zero_extend:DI > > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" > "r")))))] > > + "" > > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > > + [(set_attr "type" "csel")] > > +) > > + > > The operand to a zero_extend can never be a const_int, so operand 2 > should just be a register_operand with an "r" constraint. OK, makes sense. I've fixed this in the updated patch. > At the risk of feature creep :-) a useful third pattern could be > to combine a zero-extended operator result with an existing DImode value. > In that case, the existing DImode value really can be "rZ" and should > always be in the "else" arm of the if_then_else. E.g.: > > unsigned long long > f(int a, unsigned long b, unsigned c) > { > return a ? b : ~c; > } > > unsigned long long > g(int a, unsigned c) > { > return a ? 0 : ~c; > } > > But that's definitely something that can be left for later. Hm. What sequence would you like to see for the function f here with the argument already in DImode? I don't think we can do it with just a CMP + CSINV like in the other test cases because you end up wanting to access b as x1 and c as w2 which is not allowed. Indeed, clang generates a MVN + CSEL sequence where the CSEL operates on the 64-bit registers: f: mvn w8, w2 cmp w0, #0 csel x0, x8, x1, eq ret However, I agree that the restricted case where the else arm is (const_int 0) is a worthwhile optimization (that wasn't caught by the previous patterns due to const_ints never being zero_extended as you mentioned). I've added a pattern and tests for this case in the updated patch. Many thanks, Alex --- gcc/ChangeLog: 2020-04-30 Alex Coplan <alex.cop...@arm.com> * config/aarch64/aarch64.md (*csinv3_utxw_insn1): New. (*csinv3_uxtw_insn2): New. (*csinv3_uxtw_insn3): New. * config/aarch64/iterators.md (neg_not_cs): New. gcc/testsuite/ChangeLog: 2020-04-30 Alex Coplan <alex.cop...@arm.com> * gcc.target/aarch64/csinv-neg.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c7c4d1dd519..37d651724ad 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4390,6 +4390,45 @@ [(set_attr "type" "csel")] ) +(define_insn "*csinv3_uxtw_insn1" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] + "" + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn2" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (zero_extend:DI + (match_operand:SI 3 "register_operand" "r"))))] + "" + "cs<neg_not_cs>\\t%w0, %w3, %w2, %M1" + [(set_attr "type" "csel")] +) + +(define_insn "*csinv3_uxtw_insn3" + [(set (match_operand:DI 0 "register_operand" "=r") + (if_then_else:DI + (match_operand 1 "aarch64_comparison_operation" "") + (zero_extend:DI + (NEG_NOT:SI (match_operand:SI 2 "register_operand" "r"))) + (const_int 0)))] + "" + "cs<neg_not_cs>\\t%w0, wzr, %w2, %M1" + [(set_attr "type" "csel")] +) + + ;; If X can be loaded by a single CNT[BHWD] instruction, ;; ;; A = UMAX (B, X) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e434389e59..a568cf21b99 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1932,6 +1932,9 @@ ;; Operation names for negate and bitwise complement. (define_code_attr neg_not_op [(neg "neg") (not "not")]) +;; csinv, csneg insn suffixes. +(define_code_attr neg_not_cs [(neg "neg") (not "inv")]) + ;; Similar, but when the second operand is inverted. (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")]) diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-neg.c b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c new file mode 100644 index 00000000000..cc64b4094d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinv-neg.c @@ -0,0 +1,104 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* +** inv1: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : ~c; +} + +/* +** inv1_local: +** cmp w0, 0 +** csinv w0, w1, w2, ne +** ret +*/ +unsigned long long +inv1_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~c; + return a ? b : d; +} + +/* +** inv_zero1: +** cmp w0, 0 +** csinv w0, wzr, w1, ne +** ret +*/ +unsigned long long +inv_zero1(unsigned a, unsigned b) +{ + return a ? 0 : ~b; +} + +/* +** inv_zero2: +** cmp w0, 0 +** csinv w0, wzr, w1, eq +** ret +*/ +unsigned long long +inv_zero2(unsigned a, unsigned b) +{ + return a ? ~b : 0; +} + + +/* +** inv2: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2(unsigned a, unsigned b, unsigned c) +{ + return a ? ~b : c; +} + +/* +** inv2_local: +** cmp w0, 0 +** csinv w0, w2, w1, eq +** ret +*/ +unsigned long long +inv2_local(unsigned a, unsigned b, unsigned c) +{ + unsigned d = ~b; + return a ? d : c; +} + +/* +** neg1: +** cmp w0, 0 +** csneg w0, w1, w2, ne +** ret +*/ +unsigned long long +neg1(unsigned a, unsigned b, unsigned c) +{ + return a ? b : -c; +} + + +/* +** neg2: +** cmp w0, 0 +** csneg w0, w2, w1, eq +** ret +*/ +unsigned long long +neg2(unsigned a, unsigned b, unsigned c) +{ + return a ? -b : c; +} + +/* { dg-final { check-function-bodies "**" "" "" } } */