On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches > wrote: > > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y). > > > Tested successfully on x86_64 and x86 targets. > > > > > > PR middle-end/109986 > > > > > > gcc/ChangeLog: > > > > > > * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.c-torture/execute/pr109986.c: New test. > > > * gcc.dg/tree-ssa/pr109986.c: New test. > > > --- > > > gcc/match.pd | 11 ++ > > > .../gcc.c-torture/execute/pr109986.c | 41 ++++ > > > gcc/testsuite/gcc.dg/tree-ssa/pr109986.c | 177 ++++++++++++++++++ > > > 3 files changed, 229 insertions(+) > > > create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > index a17d6838c14..d9d7d932881 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) > > > (convert (bit_and @1 (bit_not @0))))) > > > > > > +/* (~X | Y) ^ X -> ~(X & Y). */ > > > +(simplify > > > + (bit_xor:c (nop_convert1? > > > + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0))) > > > + @1)) (nop_convert4? @0)) > > > > you want to reduce the number of nop_convert? - for example > > I wonder if we can canonicalize > > > > (T)~X and ~(T)X > > > > for nop-conversions. The same might apply to binary bitwise operations > > where we should push those to a direction where they are likely eliminated. > > Usually we'd push them outwards. > > > > The issue with the above pattern is that nop_convertN? expands to 2^N > > separate patterns. Together with the two :c you get 64 out of this. > > > > I do not see that all of the combinations can happen when X has to > > match unless we fail to contract some of them like if we have > > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like > > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X > > with the last step being somewhat difficult unless we do > > (signed)~X | Y -> (signed)(~X | (unsigned)Y). It feels like a > > propagation problem and less of a direct pattern matching one. > > The nop_convert1? in the pattern might seem to be unnecessary > for cases like: > int i, j, k, l; > unsigned u, v, w, x; > > void > foo (void) > { > int t0 = i; > int t1 = (~t0) | j; > x = t1 ^ (unsigned) t0; > unsigned t2 = u; > unsigned t3 = (~t2) | v; > i = ((int) t3) ^ (int) t2; > } > we actually optimize it with or without the nop_convert1? in place, > because we have the > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) > when profitable. > ... > (bitop (convert@2 @0) (convert?@3 @1)) > ... > (convert (bitop @0 (convert @1))))) > simplification. > Except that on > void > bar (void) > { > unsigned t0 = u; > int t1 = (~(int) t0) | j; > x = t1 ^ t0; > int t2 = i; > unsigned t3 = (~(unsigned) t2) | v; > i = ((int) t3) ^ t2; > } > the optimization doesn't trigger without the nop_convert1? and does > with it. > > Perhaps we could get rid of nop_convert3? and nop_convert4? > by introducing a macro/inline function predicate like: > bitwise_equal_p (expr1, expr2) and instead of using > (nop_convert3? @0) and (nop_convert4? @0) in the pattern > use @0 and @2 and then add > if (bitwise_equal_p (@0, @2)) > to the condition. > For GENERIC (i.e. in generic-match-head.cc) it could be something like: > static inline bool > bitwise_equal_p (tree expr1, tree expr2) > { > STRIP_NOPS (expr1); > STRIP_NOPS (expr2); > if (expr1 == expr2) > return true; > if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2))) > return false; > if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST) > return wi::to_wide (expr1) == wi::to_wide (expr2); > return operand_equal_p (expr1, expr2, 0); > } > (the INTEGER_CST special case because operand_equal_p compares wi::to_widest > which could be different if one constant is signed and the other unsigned). > For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into > account, and do something like: > #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, > valueize) > > bool gimple_nop_convert (tree, tree *, tree (*)(tree)); > > static inline bool > gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree)) > { > if (expr1 == expr2) > return true; > if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2))) > return false; > if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST) > return wi::to_wide (expr1) == wi::to_wide (expr2); > if (operand_equal_p (expr1, expr2, 0)) > return true; > tree expr3, expr4; > if (!gimple_nop_convert (expr1, &expr3, valueize)) > expr3 = expr1; > if (!gimple_nop_convert (expr2, &expr4, valueize)) > expr4 = expr2; > if (expr1 != expr3) > { > if (operand_equal_p (expr3, expr2, 0)) > return true; > if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0)) > return true; > } > if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0)) > return true; > return false; > } > > Completely untested. What do you think? > Though, that brings us only still to 16 cases of this.
I guess we can also not worry and hope for a better code generator ... The obvious improvement there is to delay pattern expansion (with for and ?) until we get two patterns on the same sub-tree so patterns that are the only ones at some point during the sub-tree matching can then be expanded with code generation optimized for code size (:c is the only difficult case there). Matching the shortest paths to leaf first might then improve things further. But this is a complete rewrite of the decision tree builder, so ... Richard. > > Jakub >