------- Comment #7 from ubizjak at gmail dot com 2008-04-24 19:56 ------- Created an attachment (id=15527) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15527&action=view) x86_64 asm dump of trisolve procedure (genereated without the patch)
All the difference is in trisolve procedure (attached). The performance will be 10% better, if trisolve in the dump is substituted with attached function. I'm using -O2 -funroll-loops. BTW: There are two loops in this asm (.L3 and .L5). In current asm, suspicious parts are: movsd 16(%r9), %xmm6 mulsd 16(%r8), %xmm6 and mulsd -16(%rdx), %xmm0 mulsd -16(%r11), %xmm0 That is - loads from different addresses that are not present in non-patched asm. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163