https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82729
Bug ID: 82729 Summary: adjacent small objects can be initialized with a single store (but aren't for char a[] = "a") Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: x86_64-*-*, i?86-*-* void ext(char *, char *, char *); void foo(void) { char abc[] = "abc"; char ab[] = "ab"; char a[] = "a"; ext(a, ab, abc); } gcc 8.0.0 20171024 -O3 https://godbolt.org/g/mFNUgn foo: -march=bdver3 to avoid moving to 32-bit registers first subq $24, %rsp leaq 12(%rsp), %rdx leaq 9(%rsp), %rsi leaq 7(%rsp), %rdi # these 4 stores only need 2 instructions movl $6513249, 12(%rsp) movw $25185, 9(%rsp) movb $0, 11(%rsp) # last byte of ab[] movw $97, 7(%rsp) call ext addq $24, %rsp ret -march=haswell still avoids movw $imm16, (mem), even though Haswell doesn't have LCP stalls. But that's not what this bug is about. A single push imm32 or mov $imm32, r/m64 could store a[] and ab[], because sign-extension will produce 4 bytes of zeros in the high half. We only need one of those zeros to terminate the string. If you don't want to waste the extra 3 bytes of padding, simply have the next store overlap it. Or keeping the layout identical: ... movq $0x62610061, 7(%rsp) # zero some of the bytes for abc[] #memory at 7(%rsp) = 'a', 0, 'a', 'b', 0, 0 (rsp+12), 0, 0 movl $6513249, 12(%rsp) # then initialize abc[] ... x86 CPUs generally have good support for overlapping stores. e.g. store-forwarding still works from the movq to a load of a[] or ab[], and also works from the movl to a load from abc[]. related: bug 82142, padding in structs stopping store merging. But this isn't padding, it's merging across separate objects that are / can be placed next to each other on the stack. ---- On ARM, we can take advantage of redundancy between the string data as well instead of using a string constant for ab[] and a literal pool with a pointer + abc[]. # This is dumb: ARM gcc 6.3.0 .L3: .word .LC0 # Should just store ab[] literally here .word 6513249 .LC0: .ascii "ab\000" You can also do stuff like add r1, r1, 'c' LSL 16 to append the 'c' byte if you have "ab" in a register. Or if it's a common suffix instead of prefix, left-shift and add. Or start with the 4-byte object (including the terminating zero) and AND out the characters. IDK if this is a common enough pattern to spend time searching for that much redundancy between constant initializers. But I think on x86 it would be a good idea to zero a register instead of doing more than 2 or 3 repeated movq $0, (mem) especially when the addressing mode is RIP-relative (can't micro-fuse immediate + RIP-relative addressing mode), or otherwise uses a 32-bit displacement. (Code-size matters.)