http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60879
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Jakub Jelinek from comment #1) > Does this ever matter though? I mean, wouldn't we expand it as move by > pieces or store by pieces for such small constant length anyway and thus > never reach the target movmem/setmem expansion? move by pieces or store by pieces are very efficient for targets with unaligned move/store for integer and vector: [hjl@gnu-6 partial]$ cat w.i void foo5 (const void *src, void *dest, int s) { __builtin_memcpy (dest, src, 23); } [hjl@gnu-6 partial]$ gcc -S -O2 w.i [hjl@gnu-6 partial]$ cat w.s .file "w.i" .text .p2align 4,,15 .globl foo5 .type foo5, @function foo5: .LFB0: .cfi_startproc movq (%rdi), %rax movq %rax, (%rsi) movq 8(%rdi), %rax movq %rax, 8(%rsi) movl 16(%rdi), %eax movl %eax, 16(%rsi) movzwl 20(%rdi), %eax movw %ax, 20(%rsi) movzbl 22(%rdi), %eax movb %al, 22(%rsi) ret I am working on a different set/mov memory strategy to generate movdqu (%rdi), %xmm0 movups %xmm0, (%rsi) movq 15(%rdi), %rax movq %rax, 15(%rsi) ret by setting MOVE_RATIO to 1 and handling most of set/mov memory in x86 backend.