------- Comment #7 from ubizjak at gmail dot com 2008-04-24 19:56 -------
Created an attachment (id=15527)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15527&action=view)
x86_64 asm dump of trisolve procedure (genereated without the patch)
All the difference is in trisolve procedure (attached). The performance will be
10% better, if trisolve in the dump is substituted with attached function.
I'm using -O2 -funroll-loops.
BTW: There are two loops in this asm (.L3 and .L5). In current asm, suspicious
parts are:
movsd 16(%r9), %xmm6
mulsd 16(%r8), %xmm6
and
mulsd -16(%rdx), %xmm0
mulsd -16(%r11), %xmm0
That is - loads from different addresses that are not present in non-patched
asm.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163