http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244
--- Comment #65 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Oleg Endo from comment #64) > > would be simplified to this: > > mov.l @(4,r4),r1 > tst r1,r1 // T = @(4,r4) == 0 > .L3: > bt/s .L5 > mov #1,r1 > cmp/hi r1,r5 > bf/s .L9 > mov #0,r0 > rts > nop > .L2: > mov.l @r4,r1 > bra .L3 > tst r1,r1 // T = @(r4) == 0 Sorry, I got confused. The above is wrong. One of the T bit inversions can't be eliminated in this case. It should be: mov.l @(4,r4),r1 .L3: tst r1,r1 bt/s .L5 mov #1,r1 cmp/hi r1,r5 bf/s .L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 bra .L3 movt r1 Or SH2A: mov.l @(4,r4),r1 tst r1,r1 .L3: bt/s .L5 mov #1,r1 cmp/hi r1,r5 bf/s .L9 mov #0,r0 rts nop .L2: mov.l @r4,r1 tst r1,r1 bra .L3 nott However, my original 'optimized' asm snippet is valid if the reduced test case is changed to: static inline int blk_oversized_queue (int* q) { if (q[2]) return q[1] == 0; // instead of != 0 return q[0] == 0; } The current trunk version eliminates the movt/tst insns and produces correct code by accident. It can be simplified even more: mov.l @(4,r4),r1 .L3: tst r1,r1 bt/s .L5 mov #1,r1 cmp/hi r1,r5 bf/s .L9 mov #0,r0 rts nop .L2: bra .L3 mov.l @r4,r1 I'm trying to come up with a patch that implements t bit tracing in order to handle those scenarios.