[Bug c++/42209] New: missed optimization leads to several times slower code
jmp80484d0 <_Z11switch_caseIjET_hS0_S0_S0_+0x20> 804856b: 90 nop 804856c: 8d 74 26 00 lea0x0(%esi,%eiz,1),%esi 8048570: 31 da xor%ebx,%edx 8048572: e9 59 ff ff ff jmp80484d0 <_Z11switch_caseIjET_hS0_S0_S0_+0x20> 8048577: 89 f6 mov%esi,%esi 8048579: 8d bc 27 00 00 00 00lea0x0(%edi,%eiz,1),%edi 08048580 <_Z4slowPjPKj>: 8048580: 55 push %ebp 8048581: 89 e5 mov%esp,%ebp 8048583: 83 ec 10sub$0x10,%esp 8048586: 89 7d fcmov%edi,-0x4(%ebp) 8048589: 8b 7d 0cmov0xc(%ebp),%edi 804858c: 89 75 f8mov%esi,-0x8(%ebp) 804858f: 8b 75 08mov0x8(%ebp),%esi 8048592: 89 5d f4mov%ebx,-0xc(%ebp) 8048595: 8b 5f 04mov0x4(%edi),%ebx 8048598: 8b 07 mov(%edi),%eax 804859a: c7 04 24 fc ff ff ffmovl $0xfffc,(%esp) 80485a1: 8b 16 mov(%esi),%edx 80485a3: 89 d9 mov%ebx,%ecx 80485a5: c1 e8 08shr$0x8,%eax 80485a8: c1 e1 18shl$0x18,%ecx 80485ab: 09 c1 or %eax,%ecx 80485ad: b8 0c 00 00 00 mov$0xc,%eax 80485b2: e8 f9 fe ff ff call 80484b0 <_Z11switch_caseIjET_hS0_S0_S0_> 80485b7: 89 06 mov%eax,(%esi) 80485b9: b8 01 00 00 00 mov$0x1,%eax 80485be: 8b 54 87 04 mov0x4(%edi,%eax,4),%edx 80485c2: c1 eb 08shr$0x8,%ebx 80485c5: 89 d1 mov%edx,%ecx 80485c7: c1 e1 18shl$0x18,%ecx 80485ca: 09 cb or %ecx,%ebx 80485cc: 89 1c 86mov%ebx,(%esi,%eax,4) 80485cf: 83 c0 01add$0x1,%eax 80485d2: 89 d3 mov%edx,%ebx 80485d4: 83 f8 3fcmp$0x3f,%eax 80485d7: 75 e5 jne80485be <_Z4slowPjPKj+0x3e> 80485d9: 8b 87 00 01 00 00 mov0x100(%edi),%eax 80485df: 89 d1 mov%edx,%ecx 80485e1: c7 04 24 ff ff ff 0fmovl $0xfff,(%esp) 80485e8: 8b 96 fc 00 00 00 mov0xfc(%esi),%edx 80485ee: c1 e9 08shr$0x8,%ecx 80485f1: c1 e0 18shl$0x18,%eax 80485f4: 09 c1 or %eax,%ecx 80485f6: b8 0c 00 00 00 mov$0xc,%eax 80485fb: e8 b0 fe ff ff call 80484b0 <_Z11switch_caseIjET_hS0_S0_S0_> 8048600: 89 86 fc 00 00 00 mov%eax,0xfc(%esi) 8048606: 8b 5d f4mov-0xc(%ebp),%ebx 8048609: 8b 75 f8mov-0x8(%ebp),%esi 804860c: 8b 7d fcmov-0x4(%ebp),%edi 804860f: 89 ec mov%ebp,%esp 8048611: 5d pop%ebp 8048612: c3 ret -- Summary: missed optimization leads to several times slower code Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gb-0001 at xsim dot com GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209
[Bug tree-optimization/42209] missed optimization leads to several times slower code
--- Comment #2 from gb-0001 at xsim dot com 2009-11-29 17:34 --- >[For the call in the loop GCC assumes it is more beneficial] And in this case it is: the inner loop code is yet simpler than the prologue/eiplogue code. >[If you are sure it is always beneficial...] It is not always beneficial. It is close enough to always if the "op" parameter is a compile-time constant, and "op" usually is a compile-time constant. Taking advantage of that would require annotating the call site with a conditional inlining information. Is that possible in GCC? >[It is unlikely fixed in 4.4] This is not important (for me) to fix in 4.4 -- the code is not yet public and even when it is, it is not clear anybody else will use it. My principal concerns are it would be nice if my code were faster, and this may represent a class of lost optimizations for others. I filed this ticket at reduced severity to reflect that, feel free to adjust priority/severity to reflect that (or tell me what to change). >[As 4.5 works...] My reading is 4.5 inlines it if told to always_inline, but inlining is a loss when "op" is a runtime variable -- it would inline the code up to about 20 times without being able to optimize any inlined copy. Is there a way to annotate "inline if op is a compile-time constant?" -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209
[Bug tree-optimization/42209] missed optimization leads to several times slower code
--- Comment #5 from gb-0001 at xsim dot com 2009-11-30 02:14 --- >[It works in 4.5 with "inline", "always_inline" not needed.] Ah, I misunderstood -- seems good to me. I'd say fixed in 4.5 unless somebody else cares. Digression: this suggests an attribute such as "inline_if_reduces" which inlines if the inlined (callee) code is simplified, but otherwise keeps it out of line. In other words, code growth is okay, but not when the savings is only call/return reduction. For "switch_case()", "inline_if_reduces_50" (inline if the inlined callee is under 50% of the out-of-line version) would be good: here, inlining reduces the dynamic code path by about 80% and the inlined code size (at each caller) is under 5% of the size before inline simplification. Except for a slight increase in code size, it is a big enough win in this case (once the compiler knows some code expansion is okay) to set a crude threshold that does not need to be precise (what's the size of an x86 instruction vs. a MIPS instruction, etc.), yet mostly avoids false positives (inlining that hurts because the simplification is at best minor). In my experience, the biggest win from inlining with code growth is cases that get a lot better -- when the difference is small, out of line is either almost as good or is better. (End of digression.) Thanks! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42209