http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
Bug ID: 58033 Summary: counterproductive bb-reorder Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: olegendo at gcc dot gnu.org CC: steven at gcc dot gnu.org, tejohnson at google dot com Target: sh*-*-* On SH, compiling the following code with -O2 #include <bitset> std::bitset<32> make_bits (void) { std::bitset<32> r; for (auto&& i : { 4, 5, 6, 10 }) if (i < r.size ()) r.set (i); return r; } results in the following code: mov.l .L8,r1 mov #0,r0 mov #31,r7 mov #1,r6 mov #4,r2 .L2: mov.l @r1,r3 cmp/hi r7,r3 bf/s .L7 mov r6,r5 .L3: dt r2 bf/s .L2 // branch if value not > 31, i.e. in each iteration add #4,r1 rts nop .align 1 .L7: shld r3,r5 bra .L3 or r5,r0 .L9: .align 2 .L8: .long _._45+0 _._45: .long 4 .long 5 .long 6 .long 10 Disabling the bb-reorder pass or using -Os results in more compact and faster code: mov.l .L7,r1 mov #0,r0 mov #31,r7 mov #1,r6 mov #4,r2 .L2: mov.l @r1,r3 cmp/hi r7,r3 bt/s .L3 // branch if value > 31, i.e. never. mov r6,r5 shld r3,r5 or r5,r0 .L3: dt r2 bf/s .L2 add #4,r1 rts nop Of course the bb-reorder pass doesn't know that the values in this case are always in range. Still, maybe it could be improved by not splitting out a BB if it consists only of a few insns? I've tried increasing the branch cost but it won't do anything. Teresa, Steven,