Compiling the folloxing testcase:
#define vector __attribute__((__vector_size__(16) ))
float fa[100] __attribute__ ((__aligned__(16)));
vector float foo ()
{
float f = fa[0];
vector float vf = {f, f, f, f};
return vf;
}
...with gcc -O2 -maltivec, we get:
ld r9,0(r2)
lfs f0,0(r9)
addi r9,r1,-16
stfs f0,-16(r1)
lvewx v2,r0,r9
vspltw v2,v2,0
blr
My problem is with the {lfs,stfs,lvewx} sequence: we load a value into f0, and
then store it (with stfs) into an aligned memory location, so that it could be
loaded from there into a vector (with lvewx). However, since the address from
which f0 was loaded is known to be aligned, we could directly do an lvewx from
there, and avoid the extra {lfs,stfs}, so the following should be enough:
ld r9,0(r2)
lvewx v2,r0,r9
vspltw v2,v2,0
blr
The problem is that rs6000_expand_vector_init doesn't know that f0 is
originated from an aligned address. It gets the following as vals:
(parallel:V4SF [
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
(reg/v:SF 119 [ f ])
])
We somehow want to expand 'f = fa[0]' and '{f,f,f,f}' together... if
expand_vector_init could get this as vals: '{fa[0],fa[0],fa[0],fa[0]}', it
could see that the original address is aligned.
Alternatively, the prospects of getting rid of the redundant load and store
later on during some kind of a peephole optimization don't seem so high to
me... Thoughts?
This may be related to PR31334 (though there the issue is about initialization
with constants, so I'm not sure if the idea for a solution proposed there would
help us here).
--
Summary: bad codegen for vector initialization in Altivec
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at il dot ibm dot com
GCC build triplet: powerpc-linux
GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32107