https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79148
Bug ID: 79148 Summary: stack addresses are spilled to stack slots on x86-64 at -Os instead of rematerializing the addresses Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: froydnj at gcc dot gnu.org Target Milestone: --- Noticed this while browsing around Firefox source code compiled with GCC 5.4; a colleague confirms that this happens with 6.3 as well. Compiling: https://people.mozilla.org/~nfroyd/Unified_cpp_widget0.ii.gz (Tried to get it under the attachment limit with xz, didn't happen) with options: -mtune=generic -march=x86-64 -g -Os -std=gnu++11 -fPIC -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -freorder-blocks -fno-omit-frame-pointer -fstack-protector-strong gives, for the function _ZN7mozilla6widget11GfxInfoBase20GetFeatureStatusImplEiPiR18nsAString_internalRK8nsTArrayINS0_13GfxDriverInfoEER19nsACString_internalPNS0_15OperatingSystemE a bit of code that looks like: .LVL3402: leaq -784(%rbp), %rax [1a] .LVL3403: movq %rax, %rdi .LVL3404: movq %rax, -816(%rbp) [1b] call _ZN12nsAutoStringC1Ev .LVL3405: .loc 14 887 0 leaq -624(%rbp), %rax [2a] movq %rax, %rdi movq %rax, -824(%rbp) [2b] call _ZN12nsAutoStringC1Ev .LVL3406: .loc 14 888 0 leaq -464(%rbp), %rax [3a] movq %rax, %rdi movq %rax, -800(%rbp) [3b] call _ZN12nsAutoStringC1Ev .LVL3407: .loc 14 889 0 movq (%r12), %rax movq -816(%rbp), %rsi [1c] movq %r12, %rdi call *104(%rax) .LVL3408: .loc 14 890 0 testl %eax, %eax js .L2479 movq (%r12), %rax movq -824(%rbp), %rsi [2c] movq %r12, %rdi call *120(%rax) .LVL3409: .loc 14 889 0 testl %eax, %eax js .L2479 .loc 14 891 0 movq (%r12), %rax movq -800(%rbp), %rsi [3c] movq %r12, %rdi call *168(%rax) The problem here, for each of the trio of instructions marked [1], [2], and [3], is that the instructions [1b], [2b], and [3b] that store the stack addresses are really unnecessary; replacing [1c], [2c], and [3c] with the `lea` instructions from [1a], [2a], and [3a] is the same size and doesn't require the stack slot storage, so we could eliminate those instructions ([1b], [2b], and [3b]) and (possibly) make the stack frame smaller as well. I think rematerializing the stack addresses on x86/x86-64 ought always to be a win in terms of size (I don't know whether you'd want to make the same choices when compiling for speed); I think it'd be a similar win for RISC-y chips, at least so long as the stack frame sizes are reasonably small.