[Bug c++/42209] New: missed optimization leads to several times slower code

2009-11-28 Thread gb-0001 at xsim dot com
 jmp80484d0
<_Z11switch_caseIjET_hS0_S0_S0_+0x20>
 804856b:   90  nop
 804856c:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi
 8048570:   31 da   xor%ebx,%edx
 8048572:   e9 59 ff ff ff  jmp80484d0
<_Z11switch_caseIjET_hS0_S0_S0_+0x20>
 8048577:   89 f6   mov%esi,%esi
 8048579:   8d bc 27 00 00 00 00lea0x0(%edi,%eiz,1),%edi

08048580 <_Z4slowPjPKj>:
 8048580:   55  push   %ebp
 8048581:   89 e5   mov%esp,%ebp
 8048583:   83 ec 10sub$0x10,%esp
 8048586:   89 7d fcmov%edi,-0x4(%ebp)
 8048589:   8b 7d 0cmov0xc(%ebp),%edi
 804858c:   89 75 f8mov%esi,-0x8(%ebp)
 804858f:   8b 75 08mov0x8(%ebp),%esi
 8048592:   89 5d f4mov%ebx,-0xc(%ebp)
 8048595:   8b 5f 04mov0x4(%edi),%ebx
 8048598:   8b 07   mov(%edi),%eax
 804859a:   c7 04 24 fc ff ff ffmovl   $0xfffc,(%esp)
 80485a1:   8b 16   mov(%esi),%edx
 80485a3:   89 d9   mov%ebx,%ecx
 80485a5:   c1 e8 08shr$0x8,%eax
 80485a8:   c1 e1 18shl$0x18,%ecx
 80485ab:   09 c1   or %eax,%ecx
 80485ad:   b8 0c 00 00 00  mov$0xc,%eax
 80485b2:   e8 f9 fe ff ff  call   80484b0
<_Z11switch_caseIjET_hS0_S0_S0_>
 80485b7:   89 06   mov%eax,(%esi)
 80485b9:   b8 01 00 00 00  mov$0x1,%eax
 80485be:   8b 54 87 04 mov0x4(%edi,%eax,4),%edx
 80485c2:   c1 eb 08shr$0x8,%ebx
 80485c5:   89 d1   mov%edx,%ecx
 80485c7:   c1 e1 18shl$0x18,%ecx
 80485ca:   09 cb   or %ecx,%ebx
 80485cc:   89 1c 86mov%ebx,(%esi,%eax,4)
 80485cf:   83 c0 01add$0x1,%eax
 80485d2:   89 d3   mov%edx,%ebx
 80485d4:   83 f8 3fcmp$0x3f,%eax
 80485d7:   75 e5   jne80485be <_Z4slowPjPKj+0x3e>
 80485d9:   8b 87 00 01 00 00   mov0x100(%edi),%eax
 80485df:   89 d1   mov%edx,%ecx
 80485e1:   c7 04 24 ff ff ff 0fmovl   $0xfff,(%esp)
 80485e8:   8b 96 fc 00 00 00   mov0xfc(%esi),%edx
 80485ee:   c1 e9 08shr$0x8,%ecx
 80485f1:   c1 e0 18shl$0x18,%eax
 80485f4:   09 c1   or %eax,%ecx
 80485f6:   b8 0c 00 00 00  mov$0xc,%eax
 80485fb:   e8 b0 fe ff ff  call   80484b0
<_Z11switch_caseIjET_hS0_S0_S0_>
 8048600:   89 86 fc 00 00 00   mov%eax,0xfc(%esi)
 8048606:   8b 5d f4mov-0xc(%ebp),%ebx
 8048609:   8b 75 f8mov-0x8(%ebp),%esi
 804860c:   8b 7d fcmov-0x4(%ebp),%edi
 804860f:   89 ec   mov%ebp,%esp
 8048611:   5d  pop%ebp
 8048612:   c3  ret


-- 
   Summary: missed optimization leads to several times slower code
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: gb-0001 at xsim dot com
 GCC build triplet: i486-linux-gnu
  GCC host triplet: i486-linux-gnu
GCC target triplet: i486-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209



[Bug tree-optimization/42209] missed optimization leads to several times slower code

2009-11-29 Thread gb-0001 at xsim dot com


--- Comment #2 from gb-0001 at xsim dot com  2009-11-29 17:34 ---
>[For the call in the loop GCC assumes it is more beneficial]

And in this case it is: the inner loop code is yet simpler than the
prologue/eiplogue code.

>[If you are sure it is always beneficial...]

It is not always beneficial.  It is close enough to always if the "op"
parameter is a compile-time constant, and "op" usually is a compile-time
constant.  Taking advantage of that would require annotating the call site with
a conditional inlining information.  Is that possible in GCC?

>[It is unlikely fixed in 4.4]

This is not important (for me) to fix in 4.4 -- the code is not yet public and
even when it is, it is not clear anybody else will use it.  My principal
concerns are it would be nice if my code were faster, and this may represent a
class of lost optimizations for others.  I filed this ticket at reduced
severity to reflect that, feel free to adjust priority/severity to reflect that
(or tell me what to change).

>[As 4.5 works...]

My reading is 4.5 inlines it if told to always_inline, but inlining is a loss
when "op" is a runtime variable -- it would inline the code up to about 20
times without being able to optimize any inlined copy.  Is there a way to
annotate "inline if op is a compile-time constant?"


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209



[Bug tree-optimization/42209] missed optimization leads to several times slower code

2009-11-29 Thread gb-0001 at xsim dot com


--- Comment #5 from gb-0001 at xsim dot com  2009-11-30 02:14 ---
>[It works in 4.5 with "inline", "always_inline" not needed.]

Ah, I misunderstood -- seems good to me.  I'd say fixed in 4.5 unless somebody
else cares.

Digression: this suggests an attribute such as "inline_if_reduces" which
inlines if the inlined (callee) code is simplified, but otherwise keeps it out
of line.  In other words, code growth is okay, but not when the savings is only
call/return reduction.  For "switch_case()", "inline_if_reduces_50" (inline if
the inlined callee is under 50% of the out-of-line version) would be good:
here, inlining reduces the dynamic code path by about 80% and the inlined code
size (at each caller) is under 5% of the size before inline simplification. 
Except for a slight increase in code size, it is a big enough win in this case
(once the compiler knows some code expansion is okay) to set a crude threshold
that does not need to be precise (what's the size of an x86 instruction vs. a
MIPS instruction, etc.), yet mostly avoids false positives (inlining that hurts
because the simplification is at best minor).  In my experience, the biggest
win from inlining with code growth is cases that get a lot better -- when the
difference is small, out of line is either almost as good or is better.  (End
of digression.)

Thanks!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209