https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66948
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2015-07-21 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |6.0 Summary|Performance regression in |[6 Regression] Performance |bit manipulation code |regression in bit | |manipulation code Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Possibly /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1) (X & C2) >> C1 into (X >> C1) & (C2 >> C1). */ (for shift (lshift rshift) (simplify (shift (convert? (bit_and @0 INTEGER_CST@2)) INTEGER_CST@1) (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); } (bit_and (shift (convert @0) @1) { mask; }))))) which in the fold-const.c variant was conditonalized on "if the latter can be further optimized." which cannot be expressed in a useful way in match.pd language. Adding single-use markers doesn't help, disabling the pattern does. On my machine the difference isn't that big - 6.9s vs 6.4s. The .optimized dump shows a lot more operations in trunk compared to GCC 5. Note the above pattern catches for example (v & 1431655765) >> 1 v = v & 1431655765; v = (v >> 1 | v) & 858993459; Ah, I see an issue. Mine.