[Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test

torva...@linux-foundation.org Sat, 21 May 2011 11:50:39 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095


--- Comment #2 from Linus Torvalds <torva...@linux-foundation.org> 2011-05-21 
18:41:15 UTC ---
(In reply to comment #1)
>
> On the RTL side combine tries to do
> 
> Trying 7, 8 -> 9:
> Failed to match this instruction:
> (parallel [
>         (set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64])
>             (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
>                 (const_int -1 [0xffffffffffffffff])))
>         (set (reg/f:DI 60 [ D.2723 ])
>             (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
>                 (const_int -1 [0xffffffffffffffff])))
>     ])
> 
> because we have a use of the decrement result in the comparison.  It doesn't
> try to combine this with the comparison though.

Why isn't there a trivial pattern for the combination of "add+cmp0"? It sounds
like a peephole optimization to me.

> So this case is really special ;)  Without the use of the decremented
> value we get the desired subq $1, (%rsi).

The whole notion of "decrement and check if zero" is just about as special as
mud. 

And I realize that without the "check if zero" part I get the single rmw
instruction, but I was really hoping that gcc would get this kind of really
obvious code right. There is absolutely no question about what the correct
result is, and gcc simply doesn't generate it.

I'm used to gcc sometimes being confused by more complicated things (inline
asms, bitfields etc), but this is really basic code.

The load-store model is fine for a Pentium 4 - those things were not very good
at complex instructions. But it generates horribly big code, and modern x86
chips all want the "operate on memory" version.

> Manually sinking the store to *argv into the if and the else yields

Yeah. And that's pretty horrible. 

> As usual combine doesn't like stores.

Is there some reason this can't just be a peephole pattern?

I really thought that gcc has done this before. 

                       Linus

[Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test

Reply via email to