http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #17 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 17:04:38 UTC --- (In reply to comment #15) > The vectorizer does not apply because it does not match the canonical > form of a reduction: here is the reduction after graphite-identity: > > # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)> > S1: l12_lower_188 = l12__lsm.18_179; > l12_lower_184 = D.1589_34 + l12_lower_188; > S2: l12__lsm.18_154 = l12_lower_184; > > Without S1 and S2, this would be recognized as a reduction by the > vectorizer. > > Why we end up with the two extra copies? > Here is the original code: > > # l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)> > l12_lower_36 = D.1589_321 + l12_lower_5; > > Graphite does the following: > > l12_lower_5 = *l12_43(D); > l12_lower_36 = D.1589_321 + l12_lower_5; > *l12_43(D) = l12_lower_36; > > Note that at this point we cannot construct this code because we use > data references and we are in Gimple form: > > *l12_43(D) = D.1589_321 + *l12_43(D); > > So I think that the code produced by Graphite is fine, and the problem > is in the cleanups that we're doing after: for instance loop invariant > motion could be improved to avoid the extra two statements S1 and S2: > > # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)> > S1: l12_lower_188 = l12__lsm.18_179; > l12_lower_184 = D.1589_34 + l12_lower_188; > S2: l12__lsm.18_154 = l12_lower_184; Well, LIM needs a copyprop to cleanup after it - but the cleanups after graphite are in a strange order. LIM is also not really the pass that is supposed to do scalarization of the memory temporary. > I also have tried to run pass_rename_ssa_copies but that would just > rename the base variable l12__lsm.18 into l12_lower and wait for the > out-of-SSA to remove the extra copies. Constant propagation does not > help either... any other suggestions? I'd suggest NEXT_PASS (pass_graphite); { struct opt_pass **p = &pass_graphite.pass.sub; NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce_loop); }