https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99504

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2021-03-10

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that in the pixel case we have an aggregate assignment:

  <bb 3> [local count: 955630225]:
  # p_17 = PHI <p_10(6), p_5(D)(5)>
  # q_18 = PHI <q_9(6), q_6(D)(5)>
  # i_19 = PHI <i_12(6), 0(5)>
  q_9 = q_18 + 4;
  p_10 = p_17 + 4;
  *p_17 = *q_18;
  i_12 = i_19 + 1;
  if (n_8(D) != i_12)
    goto <bb 6>; [89.00%]
  else
    goto <bb 4>; [11.00%]

and that's not handled by vectorization or dependence analysis.

We might want to consider applying the same folding to this as we do for
memcpy folding and turn it into

  _42 = MEM<unsigned int> [q_18, (pixel *)0];
  MEM<unsigned int> [q_17, (pixel *)0] = _42;

Reply via email to