On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote: > On Wed, 17 Dec 2014, Jakub Jelinek wrote: > > > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote: > > > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE) > > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE. In fact, > > > the latter is then transformed to (a ^ b) & m ^ a, which also has 3 > > > ops. > > > > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap > > @0 with the (bit_and ...) ? > > Yeah, I'd prefer that as well. Also you probably should make sure > the (x & ~m) and (y & m) exprs are single-use (see other examples > in match.pd using has_single_use). Ok. We have only one use of has_single_use so far in match.pd, so I hope what I have is correct... I suppose the point is not to hinder CSEing those two subexpressions if they have multiple uses.
> > BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer > > dependencies, at least if the target has andn instruction (e.g. SPARC, > > Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of > > both and and andnot instructions. While in the 2 xor forms, and > > depends on the first xor result and the second xor depends on the and > > result, so if there are multiple ALU units available, the and | andn > > form can use both the units, while the second one is unnecessarily > > serialized. > > Right - I've pointed this out as well. Of course this simply > asks for more clever expansion of ((x ^ y) & m) ^ x rather > than disabling this transform. Should be doable with TER > in the BIT_XOR_EXPR expansion code. Or figure out if teaching > combine/simplify RTX is better. Of course this form requires > one more register ... which means with high register pressure > or on register starved machines the GIMPLE canonical form might > be better (in some cases). Which means that LRA should know > how to transform it back? (interesting kind of "reload" ;)) I'd rather leave the expansion/combine stuff to someone else ;). Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk? 2014-12-17 Marek Polacek <pola...@redhat.com> PR middle-end/63568 * match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern. * gcc.dg/pr63568.c: New test. diff --git gcc/match.pd gcc/match.pd index dbca99e..4d4bc9f 100644 --- gcc/match.pd +++ gcc/match.pd @@ -382,6 +382,13 @@ along with GCC; see the file COPYING3. If not see (bit_not (bit_not @0)) @0) +/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */ +(simplify + (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2)) + (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3)) + && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4))) + (bit_xor (bit_and (bit_xor @0 @1) @2) @0))) + /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ (simplify diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c index e69de29..fb42bea 100644 --- gcc/testsuite/gcc.dg/pr63568.c +++ gcc/testsuite/gcc.dg/pr63568.c @@ -0,0 +1,54 @@ +/* PR middle-end/63568 */ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-original" } */ + +int +fn1 (int a, int b, int m) +{ + return (a & ~m) | (b & m); +} + +int +fn2 (int a, int b, int m) +{ + return (a & ~m) | (m & b); +} + +int +fn3 (int a, int b, int m) +{ + return (~m & a) | (m & b); +} + +int +fn4 (int a, int b, int m) +{ + return (~m & a) | (b & m); +} + +int +fn5 (int a, int b, int m) +{ + return (b & m) | (a & ~m); +} + +int +fn6 (int a, int b, int m) +{ + return (m & b) | (a & ~m); +} + +int +fn7 (int a, int b, int m) +{ + return (m & b) | (~m & a); +} + +int +fn8 (int a, int b, int m) +{ + return (b & m) | (~m & a); +} + +/* { dg-final { scan-tree-dump-not " \\| " "original" } } */ +/* { dg-final { cleanup-tree-dump "original" } } */ Marek