https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466
James Greenhalgh <jgreenhalgh at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jgreenhalgh at gcc dot gnu.org --- Comment #3 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> --- Created attachment 43988 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43988&action=edit Reduced testcase I believe this testcase shows the issue being reported here. Clang seems to spot this is essentially a memset across the array, while GCC doesn't. On AArch64 with Clang: .LBB1_9: // =>This Inner Loop Header: Depth=1 stp q0, q0, [x8, #-16] subs x20, x20, #8 // =8 add x8, x8, #32 // =32 b.ne .LBB1_9 On x86-64 with Clang: .LBB1_9: # =>This Inner Loop Header: Depth=1 movups %xmm0, -144(%rax,%rcx,4) movups %xmm0, -128(%rax,%rcx,4) movups %xmm0, -112(%rax,%rcx,4) movups %xmm0, -96(%rax,%rcx,4) movups %xmm0, -80(%rax,%rcx,4) movups %xmm0, -64(%rax,%rcx,4) movups %xmm0, -48(%rax,%rcx,4) movups %xmm0, -32(%rax,%rcx,4) movups %xmm0, -16(%rax,%rcx,4) movups %xmm0, (%rax,%rcx,4) addq $40, %rcx cmpq $100036, %rcx # imm = 0x186C4 jne .LBB1_9 GCC doesn't spot this. On the other hand G++'s inlining of the various random number initialisation routines really hammers Clang, which ends up emulating 128-bit arithmetic on AArch64.