Testcase (Compile at -O2 -maltivec -ftree-vectorize): int a[16*100]; int f(int e) { int i; for(i = 0;i<16*100;i++) e += a[i]; return e; } --------- Cut ----- Currently we get: ivtmp.42 = (long unsigned int) &a; vect_var_.36 = { 0, 0, 0, 0 };
<bb 3>: vect_var_.36 = MEM[index: ivtmp.42] + vect_var_.36; ivtmp.42 = ivtmp.42 + 16; if (ivtmp.42 != (long unsigned int) (&a + 6400)) goto <bb 3>; else goto <bb 4>; <bb 4>: vect_var_.39 = vect_var_.36 v>> 64; vect_var_.47 = vect_var_.39 + vect_var_.36; vect_var_.48 = vect_var_.47 v>> 32; stmp_var_.38 = BIT_FIELD_REF <vect_var_.48 + vect_var_.47, 32, 96>; return stmp_var_.38 + e; Though the last add is extra and does not need to be done, we can get rid of it by having vect_var_.36 being set initially to {e, 0, 0, 0} . Note this happens with a non zero start also, that is: int a[16*100]; int f(int e) { int i; e = 1; for(i = 0;i<16*100;i++) e += a[i]; return e; } -- Summary: Reduction with nonzero start (arbitrary also) causes an extra add to happen Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pinskia at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32825