------- Comment #5 from pinskia at gcc dot gnu dot org 2007-03-25 22:55 ------- (In reply to comment #4) > I do not believe the patch will help with the original missed optimization > because the backend never sees a direct assignment from the CONSTRUCTOR -- it > already is placed in memory. The example in comment #2 is different.
Did I miss something because for the original example the back-end does see the CONSTRUCTOR in rs6000_expand_vector_init as a parallel with the mode of V4SI. The expansion of vect_cst_.30 + {4, 4, 4, 4} calls rs6000_expand_vector_init with vals being: (parallel:V4SI ([ (const_int 4),(const_int 4),(const_int 4),(const_int 4) ]) ) So when we call easy_vector_constant with vals, we always get false as it is not a const_vector. My patch changes it so we call easy_vector_constant with a const_vector as we already proved in the loop above it is made up of constant elements as n_var is zero. For the orginal testcase, the assembly now looks like (on powerpc-darwin): _main1: mfspr r0,256 stw r0,-4(r1) oris r0,r0,0xc00c mtspr 256,r0 lis r2,ha16(LC0) la r2,lo16(LC0)(r2) vspltisw v0,4 lvx v1,0,r2 li r2,23 mtctr r2 vor v12,v0,v0 mr r0,r3 vor v13,v1,v1 vadduwm v0,v1,v0 L2: vadduwm v13,v13,v0 vadduwm v0,v0,v12 bdnz L2 vsldoi v0,v13,v13,8 addi r2,r1,-20 lwz r12,-4(r1) vadduwm v0,v0,v13 vsldoi v1,v0,v0,12 vadduwm v1,v1,v0 stvewx v1,0,r2 lwz r3,-20(r1) add r3,r3,r0 mtspr 256,r12 blr .const .align 4 LC0: .long 0 .long 1 .long 2 .long 3 Notice how we have vspltisw outside the loop. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334