http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #17 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 
17:04:38 UTC ---
(In reply to comment #15)
> The vectorizer does not apply because it does not match the canonical
> form of a reduction: here is the reduction after graphite-identity:
> 
>         # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1:        l12_lower_188 = l12__lsm.18_179;
>         l12_lower_184 = D.1589_34 + l12_lower_188;
> S2:        l12__lsm.18_154 = l12_lower_184;
> 
> Without S1 and S2, this would be recognized as a reduction by the
> vectorizer.
> 
> Why we end up with the two extra copies?
> Here is the original code:
> 
>         # l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)>
>         l12_lower_36 = D.1589_321 + l12_lower_5;
> 
> Graphite does the following:
> 
>         l12_lower_5 = *l12_43(D);
>         l12_lower_36 = D.1589_321 + l12_lower_5;
>         *l12_43(D) = l12_lower_36;
> 
> Note that at this point we cannot construct this code because we use
> data references and we are in Gimple form:
> 
>         *l12_43(D) = D.1589_321 + *l12_43(D);
> 
> So I think that the code produced by Graphite is fine, and the problem
> is in the cleanups that we're doing after: for instance loop invariant
> motion could be improved to avoid the extra two statements S1 and S2:
> 
>         # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1:        l12_lower_188 = l12__lsm.18_179;
>         l12_lower_184 = D.1589_34 + l12_lower_188;
> S2:        l12__lsm.18_154 = l12_lower_184;

Well, LIM needs a copyprop to cleanup after it - but the cleanups
after graphite are in a strange order.  LIM is also not really the
pass that is supposed to do scalarization of the memory temporary.

> I also have tried to run pass_rename_ssa_copies but that would just
> rename the base variable l12__lsm.18 into l12_lower and wait for the
> out-of-SSA to remove the extra copies.  Constant propagation does not
> help either... any other suggestions?

I'd suggest

          NEXT_PASS (pass_graphite);
            {
              struct opt_pass **p = &pass_graphite.pass.sub;
              NEXT_PASS (pass_graphite_transforms);
              NEXT_PASS (pass_lim);
              NEXT_PASS (pass_copy_prop);
              NEXT_PASS (pass_dce_loop);
            }

Reply via email to