https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #33 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Mateusz Guzik from comment #32) > For non-simd asm you can do at most 8 bytes per one mov instruction. > > Stock gcc resorts to rep movsq for sizes bigger than 40 bytes. Telling it to > not use rep movsq results in loops of 4 movsq instructions (aka 32 bytes per > iteration). > > An ok upper limit to still do this instead of punting to libcall is 256 > bytes. > > In case of -mno-simd I'm advocating for issuing the 32-byte (aka 4 store) > loops up to 256 bytes and punting to libcall otherwise. > > Fully unrolling these would raise numerous eyebrows due to i-cache footprint > and I don't believe this is warranted. One store can move up to 64 bytes.