https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66646
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2015-06-24 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Well, even if it is a small loop the theory is that mem* inline expansion will produce better code than the loop copying chars. niter is <bb 2>: if (flag_6(D) == 1) goto <bb 3>; else goto <bb 4>; <bb 3>: <bb 4>: # prephitmp_41 = PHI <-1(OVF)(2), 2(3)> ... _3 = (unsigned short) prephitmp_41; _30 = _3 + 65535; _48 = (sizetype) _30; here and the loop is guarded with <bb 5>: # i_28 = PHI <i_25(7), 0(4)> if (prephitmp_41 > 0) there is a pre-existing issue of a (OVF) constant in the IL (that's a no-no) and a missed jump-threading to expose the constant. There is also range info on the size argument of the memmove at expansion time: # RANGE [1, 2] NONZERO 3 _47 = _48 + 1; __builtin_memmove (_36, _33, _47); but we don't seem to have a target/middle-end expander for BUILT_IN_MEMMOVE. So that's a missed optimization there. Without loop distribution we fail to peel the inner loop as well (on the tree level), because Loop 2 iterates at most 32767 times. so we fail to compute a proper upper bound. The very same issue is present during loop distribution so it can't know the loop iterates only 1 or 2 times. Apart from special-casing this memmove in RTL expansion we could also enhance the memory builtin folders (in gimple-fold.c) to honor range information and in this case expand the memmove to something more optimal, like if (_47 == 1) *_36 = *_33; else *(unsigned short *)_36 = *(unsigned short *)_33; of course creating control-flow here is not expected (so dealing with this at RTL expansion time is easier). Thus confirmed - but applies to similar loops (niter bound not precise) before the SCEV changes.