[Bug rtl-optimization/82729] adjacent small objects can be initialized with a single store (but aren't for char a[] = "a")

peter at cordes dot ca Thu, 26 Oct 2017 04:29:41 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82729


--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Richard Biener from comment #1)
> The issue is we have no merging of stores at the RTL level and the GIMPLE
> level doesn't know whether the variables will end up allocated next to each
> other.

Are bug reports like this useful at all?  It seems that a good fraction of the
missed-optimization bugs I file are things that gcc doesn't really have the
infrastructure to find.  I'm hoping it's helping to improve gcc in the long
run, at least.  I guess I could try to learn more about gcc internals to find
out why it misses them on my own before filing, but either way it seems
potentially useful to document efficient asm possibilities even if gcc's
current design makes it hard to take advantage.


Anyway, could GIMPLE notice that multiple small objects are being written and
hint to RTL that it would be useful to allocate them in a certain way?  (And
give RTL a merged store that RTL would have to split if it decides not to?)

Or a more conservative approach could still be an improvement.  Can RTL realize
that it can use 4-byte stores that overlap into not-yet-initialized or
otherwise dead memory?

For -march=haswell  or generic we get 

        movl    $97, %edx
        movl    $25185, %eax       # avoid an LCP stall on Nehalem or earlier
        movw    %dx, 7(%rsp)
        ... lea
        movl    $6513249, 12(%rsp)
        movw    %ax, 9(%rsp)
        movb    $0, 11(%rsp)

This is pretty bad for code-size, and this would do the same thing with no
merging between objects, just knowing when to allow overlap into other objects.

        movl       $0x61, 7(%rsp)    # imm32 still shorter than a mov imm32 ->
reg and 16-bit store
        movl     $0x6261, 9(%rsp)
        movl   $0x636261, 12(%rsp)


(Teaching gcc that mov $imm16 is safe on Sandybridge-family is a separate bug,
I guess.  It's only other instructions with an imm16 that LCP stall, unlike on
Nehalem and earlier where mov $imm16 is a problem too.  Silvermont marks
instruction lengths in the cache to avoid LCP stalls entirely, and gcc knows
that.)

[Bug rtl-optimization/82729] adjacent small objects can be initialized with a single store (but aren't for char a[] = "a")

Reply via email to