http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #2 from Linus Torvalds <torva...@linux-foundation.org> 2011-05-21 18:41:15 UTC --- (In reply to comment #1) > > On the RTL side combine tries to do > > Trying 7, 8 -> 9: > Failed to match this instruction: > (parallel [ > (set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64]) > (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 > A64]) > (const_int -1 [0xffffffffffffffff]))) > (set (reg/f:DI 60 [ D.2723 ]) > (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 > A64]) > (const_int -1 [0xffffffffffffffff]))) > ]) > > because we have a use of the decrement result in the comparison. It doesn't > try to combine this with the comparison though. Why isn't there a trivial pattern for the combination of "add+cmp0"? It sounds like a peephole optimization to me. > So this case is really special ;) Without the use of the decremented > value we get the desired subq $1, (%rsi). The whole notion of "decrement and check if zero" is just about as special as mud. And I realize that without the "check if zero" part I get the single rmw instruction, but I was really hoping that gcc would get this kind of really obvious code right. There is absolutely no question about what the correct result is, and gcc simply doesn't generate it. I'm used to gcc sometimes being confused by more complicated things (inline asms, bitfields etc), but this is really basic code. The load-store model is fine for a Pentium 4 - those things were not very good at complex instructions. But it generates horribly big code, and modern x86 chips all want the "operate on memory" version. > Manually sinking the store to *argv into the if and the else yields Yeah. And that's pretty horrible. > As usual combine doesn't like stores. Is there some reason this can't just be a peephole pattern? I really thought that gcc has done this before. Linus