https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97891
--- Comment #5 from andysem at mail dot ru --- Using a register is beneficial even for bytes and words if there are multiple of mov instructions. But there has to be a single reg0 for all movs. I'm not very knowlegeable about gcc internals, but would it be beneficial to implement this on a higher level than instruction transformation? I.e. so that instead of this: a = 0; b = 0; c = 0; we have: any reg0 = 0; // any represents a type compatible with any fundamental or enum type a = reg0; b = reg0; c = reg0; This way, reg0 would be in a single register, and that xorl instruction could be subject to other tree optimizations. With tree-level optimization, another thing to note is vectorizer. I know gcc can sometimes merge adjacent initializations without padding to a larger single instruction initializazion. For example: struct A { long a1; long a2; A() : a1(0), a2(0) { } }; void test(A* p, unsigned int count) { for (unsigned int i = 0; i < count; ++i) { p[i] = A(); } } test(A*, unsigned int): testl %esi, %esi je .L1 leal -1(%rsi), %eax pxor %xmm0, %xmm0 salq $4, %rax leaq 16(%rdi,%rax), %rax .L3: movups %xmm0, (%rdi) addq $16, %rdi cmpq %rax, %rdi jne .L3 .L1: ret I would like this to still work.