Take the following testcase (either on spu-elf or powerpc-linux-gnu with -maltivec): #define vector __attribute__((__vector_size__(16) ))
typedef vector float vec_float4; typedef struct { vec_float4 data; } VecFloat4; typedef struct { vec_float4 a; vec_float4 b; } VecFloat4x2; VecFloat4 test1(VecFloat4 a, VecFloat4 b) { a.data = a.data+b.data; return a; } VecFloat4x2 test2(VecFloat4x2 data) { data.a = data.a+data.a; data.b = data.b+data.b; return data; } ----- cut ----- Right now we do (for spu-elf, it is a similar issue for PPC): _Z5test211VecFloat4x2: hbr .L5,$lr stqd $sp,-128($sp) ai $sp,$sp,-128 stqd $3,64($sp) stqd $4,80($sp) lqd $5,80($sp) lqd $4,64($sp) fa $2,$5,$5 fa $3,$4,$4 stqd $2,48($sp) stqd $3,32($sp) lqd $4,48($sp) lqd $3,32($sp) ai $sp,$sp,128 .L5: bi $lr ---- cut ---- With the patch which I will attach, we get: _Z5test211VecFloat4x2: fa $2,$3,$3 hbr .L5,$lr stqd $sp,-128($sp) ai $sp,$sp,-128 nop 127 stqd $3,64($sp) fa $3,$4,$4 stqd $4,80($sp) nop 127 stqd $2,32($sp) ori $4,$3,0 stqd $3,48($sp) ori $3,$2,0 ai $sp,$sp,128 .L5: bi $lr ----------- cut ------ Notice how the loads are gone. Note dse could do the same. -- Summary: postreload can handle the case where the memory locations use different modes Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pinskia at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33790