https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82729
--- Comment #2 from Peter Cordes <peter at cordes dot ca> --- (In reply to Richard Biener from comment #1) > The issue is we have no merging of stores at the RTL level and the GIMPLE > level doesn't know whether the variables will end up allocated next to each > other. Are bug reports like this useful at all? It seems that a good fraction of the missed-optimization bugs I file are things that gcc doesn't really have the infrastructure to find. I'm hoping it's helping to improve gcc in the long run, at least. I guess I could try to learn more about gcc internals to find out why it misses them on my own before filing, but either way it seems potentially useful to document efficient asm possibilities even if gcc's current design makes it hard to take advantage. Anyway, could GIMPLE notice that multiple small objects are being written and hint to RTL that it would be useful to allocate them in a certain way? (And give RTL a merged store that RTL would have to split if it decides not to?) Or a more conservative approach could still be an improvement. Can RTL realize that it can use 4-byte stores that overlap into not-yet-initialized or otherwise dead memory? For -march=haswell or generic we get movl $97, %edx movl $25185, %eax # avoid an LCP stall on Nehalem or earlier movw %dx, 7(%rsp) ... lea movl $6513249, 12(%rsp) movw %ax, 9(%rsp) movb $0, 11(%rsp) This is pretty bad for code-size, and this would do the same thing with no merging between objects, just knowing when to allow overlap into other objects. movl $0x61, 7(%rsp) # imm32 still shorter than a mov imm32 -> reg and 16-bit store movl $0x6261, 9(%rsp) movl $0x636261, 12(%rsp) (Teaching gcc that mov $imm16 is safe on Sandybridge-family is a separate bug, I guess. It's only other instructions with an imm16 that LCP stall, unlike on Nehalem and earlier where mov $imm16 is a problem too. Silvermont marks instruction lengths in the cache to avoid LCP stalls entirely, and gcc knows that.)