http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #3 from Linus Torvalds <torva...@linux-foundation.org> 2011-05-21 20:42:26 UTC --- Hmm. Looking at that code generation, it strikes me that even with the odd load store situation, why do we have that "test" instruction? c: 8b 10 mov (%eax),%edx e: 83 ea 01 sub $0x1,%edx 11: 85 d2 test %edx,%edx 13: 89 10 mov %edx,(%eax) 15: 74 09 je 20 <main+0x20> iow, regardless of any complexities of the store, that "sub + test" is just odd. Gcc knows to simplify that particular sequence in other situations, why doesn't it simplify it here? IOW, I can make gcc generate code like c: 83 e8 01 sub $0x1,%eax f: 75 07 jne 18 <main+0x18> with no real problem when it's in registers. No "test" instruction after the sub. Why does that store matter so much? It looks like the combine is bring driven by the conditional branch, and then when the previous instruction from the conditional branch is that store, everything kind of goes to hell. Would it be possible to have a peephole for the "arithmetic/logical + compare-with-zero" case (causing us to just drop the compare), and then have a separate peephole optimization that triggers the "load + op + store with dead reg" and turns that into a "op to mem" case? The MD files do make me confused, so maybe there is some fundamental limitation to the peephole patterns that makes this impossible?