------- Comment #15 from zaks at il dot ibm dot com  2008-02-08 08:49 -------
(In reply to comment #5)
> (In reply to comment #3)
> > I think this is a dup of another bug I filed with respect of the builtin
> > operator new that getting the malloc attribute.
> Are you refering to using malloc instead of new? 
> using malloc didnt make any difference performance wise.

Using malloc instead of new does generate better code and improves performance
slightly for me, admittedly not as much as we would like; the kernel becomes:

(using only -O3 -S -m64 -maltivec)

.L29:
        lvx 13,7,9
        lvx 12,3,9
        vperm 1,10,13,7
        vperm 11,9,12,8
        lvx 0,29,9
        vor 10,13,13
        vor 9,12,12
        vaddfp 1,1,11
        vaddfp 0,0,1
        stvx 0,29,9
        addi 9,9,16
        bdnz .L29

which is as good as the vectorizer can get, iinm: peeling the loop to align the
store (and the load from the same address), treating the other two loads as
potentially unaligned.

To further optimize this loop we would probably want to overlap the store with
subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help with
that.


-- 

zaks at il dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zaks at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117

Reply via email to