https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99504
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords| |missed-optimization Last reconfirmed| |2021-03-10 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is that in the pixel case we have an aggregate assignment: <bb 3> [local count: 955630225]: # p_17 = PHI <p_10(6), p_5(D)(5)> # q_18 = PHI <q_9(6), q_6(D)(5)> # i_19 = PHI <i_12(6), 0(5)> q_9 = q_18 + 4; p_10 = p_17 + 4; *p_17 = *q_18; i_12 = i_19 + 1; if (n_8(D) != i_12) goto <bb 6>; [89.00%] else goto <bb 4>; [11.00%] and that's not handled by vectorization or dependence analysis. We might want to consider applying the same folding to this as we do for memcpy folding and turn it into _42 = MEM<unsigned int> [q_18, (pixel *)0]; MEM<unsigned int> [q_17, (pixel *)0] = _42;