------- Comment #15 from zaks at il dot ibm dot com 2008-02-08 08:49 ------- (In reply to comment #5) > (In reply to comment #3) > > I think this is a dup of another bug I filed with respect of the builtin > > operator new that getting the malloc attribute. > Are you refering to using malloc instead of new? > using malloc didnt make any difference performance wise.
Using malloc instead of new does generate better code and improves performance slightly for me, admittedly not as much as we would like; the kernel becomes: (using only -O3 -S -m64 -maltivec) .L29: lvx 13,7,9 lvx 12,3,9 vperm 1,10,13,7 vperm 11,9,12,8 lvx 0,29,9 vor 10,13,13 vor 9,12,12 vaddfp 1,1,11 vaddfp 0,0,1 stvx 0,29,9 addi 9,9,16 bdnz .L29 which is as good as the vectorizer can get, iinm: peeling the loop to align the store (and the load from the same address), treating the other two loads as potentially unaligned. To further optimize this loop we would probably want to overlap the store with subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help with that. -- zaks at il dot ibm dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zaks at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117