http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #16 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 18:24:43 
UTC ---
Perf confirms this findings, the first loop:

    0.02 :          401e10:       movslq %edx,%rbx
    5.04 :          401e13:       movss  -0x4(%rdi,%rbx,4),%xmm0
   24.97 :          401e19:       ucomiss (%r9),%xmm0
   14.66 :          401e1d:       cmova  %ecx,%edx
   15.37 :          401e20:       sub    $0x1,%ecx
    0.00 :          401e23:       sub    $0x4,%r9
    0.00 :          401e27:       cmp    %r10d,%ecx
    0.00 :          401e2a:       jne    401e10 <cptrf2_+0x230>

the second:

    0.00 :          401e60:       movslq %ecx,%r10
    1.69 :          401e63:       movss  -0x4(%rdi,%r10,4),%xmm0
    7.78 :          401e6a:       ucomiss (%r9),%xmm0
    4.75 :          401e6e:       cmova  %r11d,%ecx
    4.52 :          401e72:       sub    $0x1,%r11d
    0.00 :          401e76:       sub    $0x4,%r9
    0.05 :          401e7a:       cmp    %eax,%r11d
    0.00 :          401e7d:       jne    401e60 <cptrf2_+0x280>

the third:

    0.00 :          401ff8:       movslq %edx,%r10
    0.78 :          401ffb:       movss  -0x4(%rdi,%r10,4),%xmm0
    3.14 :          402002:       ucomiss (%r9),%xmm0
    2.04 :          402006:       cmova  %ecx,%edx
    1.89 :          402009:       sub    $0x4,%r9
    0.00 :          40200d:       sub    $0x1,%ecx
    0.00 :          402010:       jne    401ff8 <cptrf2_+0x418>

Reply via email to