Re: [PATCH] Add (x & ~m) | (y & m) folding (PR middle-end/63568)

Marek Polacek Wed, 17 Dec 2014 03:30:08 -0800

On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote:
> On Wed, 17 Dec 2014, Jakub Jelinek wrote:
> 
> > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
> > > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
> > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE.  In fact,
> > > the latter is then transformed to (a ^ b) & m ^ a, which also has 3
> > > ops.
> > 
> > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap
> > @0 with the (bit_and ...) ?
> 
> Yeah, I'd prefer that as well.  Also you probably should make sure
> the (x & ~m) and (y & m) exprs are single-use (see other examples
> in match.pd using has_single_use).
 
Ok.  We have only one use of has_single_use so far in match.pd, so I
hope what I have is correct...  I suppose the point is not to hinder
CSEing those two subexpressions if they have multiple uses.


> > BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer
> > dependencies, at least if the target has andn instruction (e.g. SPARC,
> > Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of
> > both and and andnot instructions.  While in the 2 xor forms, and
> > depends on the first xor result and the second xor depends on the and
> > result, so if there are multiple ALU units available, the and | andn
> > form can use both the units, while the second one is unnecessarily
> > serialized.
> 
> Right - I've pointed this out as well.  Of course this simply
> asks for more clever expansion of ((x ^ y) & m) ^ x rather
> than disabling this transform.  Should be doable with TER
> in the BIT_XOR_EXPR expansion code.  Or figure out if teaching
> combine/simplify RTX is better.  Of course this form requires
> one more register ... which means with high register pressure
> or on register starved machines the GIMPLE canonical form might
> be better (in some cases).  Which means that LRA should know
> how to transform it back? (interesting kind of "reload" ;))

I'd rather leave the expansion/combine stuff to someone else ;).

Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

2014-12-17  Marek Polacek  <[email protected]>

        PR middle-end/63568
        * match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.

        * gcc.dg/pr63568.c: New test.

diff --git gcc/match.pd gcc/match.pd
index dbca99e..4d4bc9f 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -382,6 +382,13 @@ along with GCC; see the file COPYING3.  If not see
   (bit_not (bit_not @0))
   @0)
 
+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
+(simplify
+  (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+       && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
+   (bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
+
 
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
 (simplify
diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
index e69de29..fb42bea 100644
--- gcc/testsuite/gcc.dg/pr63568.c
+++ gcc/testsuite/gcc.dg/pr63568.c
@@ -0,0 +1,54 @@
+/* PR middle-end/63568 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int
+fn1 (int a, int b, int m)
+{
+  return (a & ~m) | (b & m);
+}
+
+int
+fn2 (int a, int b, int m)
+{
+  return (a & ~m) | (m & b);
+}
+
+int
+fn3 (int a, int b, int m)
+{
+  return (~m & a) | (m & b);
+}
+
+int
+fn4 (int a, int b, int m)
+{
+  return (~m & a) | (b & m);
+}
+
+int
+fn5 (int a, int b, int m)
+{
+  return (b & m) | (a & ~m);
+}
+
+int
+fn6 (int a, int b, int m)
+{
+  return (m & b) | (a & ~m);
+}
+
+int
+fn7 (int a, int b, int m)
+{
+  return (m & b) | (~m & a);
+}
+
+int
+fn8 (int a, int b, int m)
+{
+  return (b & m) | (~m & a);
+}
+
+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
+/* { dg-final { cleanup-tree-dump "original" } } */

        Marek

Re: [PATCH] Add (x & ~m) | (y & m) folding (PR middle-end/63568)

Reply via email to