http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #16 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 18:24:43 UTC --- Perf confirms this findings, the first loop: 0.02 : 401e10: movslq %edx,%rbx 5.04 : 401e13: movss -0x4(%rdi,%rbx,4),%xmm0 24.97 : 401e19: ucomiss (%r9),%xmm0 14.66 : 401e1d: cmova %ecx,%edx 15.37 : 401e20: sub $0x1,%ecx 0.00 : 401e23: sub $0x4,%r9 0.00 : 401e27: cmp %r10d,%ecx 0.00 : 401e2a: jne 401e10 <cptrf2_+0x230> the second: 0.00 : 401e60: movslq %ecx,%r10 1.69 : 401e63: movss -0x4(%rdi,%r10,4),%xmm0 7.78 : 401e6a: ucomiss (%r9),%xmm0 4.75 : 401e6e: cmova %r11d,%ecx 4.52 : 401e72: sub $0x1,%r11d 0.00 : 401e76: sub $0x4,%r9 0.05 : 401e7a: cmp %eax,%r11d 0.00 : 401e7d: jne 401e60 <cptrf2_+0x280> the third: 0.00 : 401ff8: movslq %edx,%r10 0.78 : 401ffb: movss -0x4(%rdi,%r10,4),%xmm0 3.14 : 402002: ucomiss (%r9),%xmm0 2.04 : 402006: cmova %ecx,%edx 1.89 : 402009: sub $0x4,%r9 0.00 : 40200d: sub $0x1,%ecx 0.00 : 402010: jne 401ff8 <cptrf2_+0x418>