Compiling the folloxing testcase: #define vector __attribute__((__vector_size__(16) )) float fa[100] __attribute__ ((__aligned__(16))); vector float foo () { float f = fa[0]; vector float vf = {f, f, f, f}; return vf; }
...with gcc -O2 -maltivec, we get: ld r9,0(r2) lfs f0,0(r9) addi r9,r1,-16 stfs f0,-16(r1) lvewx v2,r0,r9 vspltw v2,v2,0 blr My problem is with the {lfs,stfs,lvewx} sequence: we load a value into f0, and then store it (with stfs) into an aligned memory location, so that it could be loaded from there into a vector (with lvewx). However, since the address from which f0 was loaded is known to be aligned, we could directly do an lvewx from there, and avoid the extra {lfs,stfs}, so the following should be enough: ld r9,0(r2) lvewx v2,r0,r9 vspltw v2,v2,0 blr The problem is that rs6000_expand_vector_init doesn't know that f0 is originated from an aligned address. It gets the following as vals: (parallel:V4SF [ (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) ]) We somehow want to expand 'f = fa[0]' and '{f,f,f,f}' together... if expand_vector_init could get this as vals: '{fa[0],fa[0],fa[0],fa[0]}', it could see that the original address is aligned. Alternatively, the prospects of getting rid of the redundant load and store later on during some kind of a peephole optimization don't seem so high to me... Thoughts? This may be related to PR31334 (though there the issue is about initialization with constants, so I'm not sure if the idea for a solution proposed there would help us here). -- Summary: bad codegen for vector initialization in Altivec Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32107