https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66646

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-06-24
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, even if it is a small loop the theory is that mem* inline expansion will
produce better code than the loop copying chars.

niter is

  <bb 2>:
  if (flag_6(D) == 1)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:

  <bb 4>:
  # prephitmp_41 = PHI <-1(OVF)(2), 2(3)>

...
  _3 = (unsigned short) prephitmp_41;
  _30 = _3 + 65535;
  _48 = (sizetype) _30;

here and the loop is guarded with

  <bb 5>:
  # i_28 = PHI <i_25(7), 0(4)>
  if (prephitmp_41 > 0)

there is a pre-existing issue of a (OVF) constant in the IL (that's a no-no)
and a missed jump-threading to expose the constant.  There is also
range info on the size argument of the memmove at expansion time:

  # RANGE [1, 2] NONZERO 3
  _47 = _48 + 1;
  __builtin_memmove (_36, _33, _47);


but we don't seem to have a target/middle-end expander for BUILT_IN_MEMMOVE.
So that's a missed optimization there.

Without loop distribution we fail to peel the inner loop as well (on the
tree level), because

Loop 2 iterates at most 32767 times.

so we fail to compute a proper upper bound.  The very same issue is
present during loop distribution so it can't know the loop iterates only
1 or 2 times.

Apart from special-casing this memmove in RTL expansion we could also enhance
the memory builtin folders (in gimple-fold.c) to honor range information
and in this case expand the memmove to something more optimal, like

  if (_47 == 1)
    *_36 = *_33;
  else
    *(unsigned short *)_36 = *(unsigned short *)_33;

of course creating control-flow here is not expected (so dealing with this
at RTL expansion time is easier).


Thus confirmed - but applies to similar loops (niter bound not precise)
before the SCEV changes.

Reply via email to