------- Comment #7 from ubizjak at gmail dot com  2008-04-24 19:56 -------
Created an attachment (id=15527)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15527&action=view)
x86_64 asm dump of trisolve procedure (genereated without the patch)

All the difference is in trisolve procedure (attached). The performance will be
10% better, if trisolve in the dump is substituted with attached function.

I'm using -O2 -funroll-loops.

BTW: There are two loops in this asm (.L3 and .L5). In current asm, suspicious
parts are:

        movsd   16(%r9), %xmm6
        mulsd   16(%r8), %xmm6

and
        mulsd   -16(%rdx), %xmm0
        mulsd   -16(%r11), %xmm0

That is - loads from different addresses that are not present in non-patched
asm.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163

Reply via email to