https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69908
--- Comment #7 from Yuri Gribov <tetra2005 at gmail dot com> --- (In reply to Marc Glisse from comment #6) > (In reply to Yuri Gribov from comment #5) > > Well, as we all know there are a lot of missing optimizations in GCC :) I > > think the real question is whether it's ever going to be fixed if there's no > > standard API for this code pattern which we can recognize as builtin. > > > > I believe the answer is "No". ATM GCC does not vectorize even the simplest > > memcpy equivalent code: > > // gcc tmp.c -O3 -mtune=native -ftree-vectorize -o- -S > > void memcpy_(char * __restrict a, char * __restrict b, unsigned n) { > > unsigned i; > > for (i = 0; i < n; ++i) > > a[i] = b[i]; > > } > > Please look again. ldist turns this into a call to memcpy. And if you > disable ldist, it does get vectorized. Hm, I've just tried r249806 both with -ftree-loop-distribution and -fno-tree-loop-distribution on top of flags above without any changes in output. This may depend on revision/flags/machine, which ones did you use?