https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82729

            Bug ID: 82729
           Summary: adjacent small objects can be initialized with a
                    single store (but aren't for char a[] = "a")
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

void ext(char *, char *, char *);

void foo(void) {
    char abc[] = "abc";
    char ab[] = "ab";
    char a[] = "a";
    ext(a, ab, abc);
}

gcc 8.0.0 20171024 -O3   https://godbolt.org/g/mFNUgn

foo:   -march=bdver3  to avoid moving to 32-bit registers first
        subq    $24, %rsp
        leaq    12(%rsp), %rdx
        leaq    9(%rsp), %rsi
        leaq    7(%rsp), %rdi

        # these 4 stores only need 2 instructions
        movl    $6513249, 12(%rsp)
        movw    $25185, 9(%rsp)        
        movb    $0, 11(%rsp)           # last byte of ab[]
        movw    $97, 7(%rsp)


        call    ext
        addq    $24, %rsp
        ret

-march=haswell still avoids movw $imm16, (mem), even though Haswell doesn't
have LCP stalls.  But that's not what this bug is about.

A single  push imm32  or  mov $imm32, r/m64  could store a[] and ab[], because
sign-extension will produce 4 bytes of zeros in the high half.  We only need
one of those zeros to terminate the string.  If you don't want to waste the
extra 3 bytes of padding, simply have the next store overlap it.

Or keeping the layout identical:

        ...
        movq    $0x62610061, 7(%rsp)   # zero some of the bytes for abc[]
         #memory at 7(%rsp) = 'a', 0, 'a', 'b', 0, 0 (rsp+12), 0, 0
        movl    $6513249, 12(%rsp)     # then initialize abc[]
        ...

x86 CPUs generally have good support for overlapping stores.  e.g.
store-forwarding still works from the movq to a load of a[] or ab[], and also
works from the movl to a load from abc[].

related: bug 82142, padding in structs stopping store merging.  But this isn't
padding, it's merging across separate objects that are / can be placed next to
each other on the stack.


----


On ARM, we can take advantage of redundancy between the string data as well
instead of using a string constant for ab[] and a literal pool with a pointer +
abc[].

# This is dumb:  ARM gcc 6.3.0
.L3:
        .word   .LC0      # Should just store ab[] literally here
        .word   6513249
.LC0:
        .ascii  "ab\000"

You can also do stuff like  add r1, r1, 'c' LSL 16  to append the 'c' byte if
you have "ab" in a register.  Or if it's a common suffix instead of prefix,
left-shift and add.  Or start with the 4-byte object (including the terminating
zero) and AND out the characters.  IDK if this is a common enough pattern to
spend time searching for that much redundancy between constant initializers.

But I think on x86 it would be a good idea to zero a register instead of doing
more than 2 or 3 repeated  movq $0, (mem)   especially when the addressing mode
is RIP-relative (can't micro-fuse immediate + RIP-relative addressing mode), or
otherwise uses a 32-bit displacement.  (Code-size matters.)

Reply via email to