https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66948

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2015-07-21
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
   Target Milestone|---                         |6.0
            Summary|Performance regression in   |[6 Regression] Performance
                   |bit manipulation code       |regression in bit
                   |                            |manipulation code
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Possibly

/* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
   (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
(for shift (lshift rshift)
 (simplify
  (shift (convert? (bit_and @0 INTEGER_CST@2)) INTEGER_CST@1)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
    (bit_and (shift (convert @0) @1) { mask; })))))

which in the fold-const.c variant was conditonalized on

"if the latter can be further optimized."

which cannot be expressed in a useful way in match.pd language.  Adding
single-use markers doesn't help, disabling the pattern does.  On my machine
the difference isn't that big - 6.9s vs 6.4s.

The .optimized dump shows a lot more operations in trunk compared to GCC 5.

Note the above pattern catches for example (v & 1431655765) >> 1

  v = v & 1431655765;
  v = (v >> 1 | v) & 858993459;

Ah, I see an issue.  Mine.

Reply via email to