------- Comment #8 from nbenoit at tuxfamily dot org 2009-12-16 10:34 ------- I am confused, a performance regression is still noticeable:
* Intel Xeon E5320 (x86_64 arch but gcc machine is i686-pc-linux-gnu), with -O1 flag GCC-4.4.2 7364 ms GCC-trunk-r155286 9515 ms * Intel Xeon 5160 (x86_64 arch and gcc machine is x86_64-linux-gnu), with -O1 flag GCC-4.4.1 5960 ms GCC-trunk-r155286 7355 ms Here is a diff on the assembly generated for the Intel E5320: $ diff 442/convol.s r155286/convol.s 11c11 < subl $8, %esp --- > subl $12, %esp 13d12 < movl $H, %esi 17c16 < imull (%esi,%eax,4), %ebx --- > imull H(,%eax,4), %ebx 22c21 < jg .L10 --- > setle %bl 24,25c23,25 < jle .L3 < .L10: --- > setle -21(%ebp) > testb %bl, -21(%ebp) > jne .L3 28c28 < .L6: --- > .L5: 31,32c31,32 < je .L5 < .L8: --- > je .L4 > .L7: 34c34 < js .L6 --- > js .L5 40c40 < .L5: --- > .L4: 43c43 < je .L7 --- > je .L6 46,47c46,47 < jmp .L8 < .L7: --- > jmp .L7 > .L6: 50c50 < addl $8, %esp --- > addl $12, %esp 60c60 < .ident "GCC: (GNU) 4.4.2" --- > .ident "GCC: (GNU) 4.5.0 20091216 (experimental)" -- nbenoit at tuxfamily dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42027