On 05/19/2015 06:06 PM, Rich Felker wrote:
> And are the above indirect calls/jumps (1983+43) candidates for
> scheduling/hoisting the address load (that's not being done yet), or
> are they the ones the compiler opted not to schedule/hoist? The win
> from relaxation seems small here, but as long as you're not going to
> block optimizations that would preclude relaxing, I don't see any
> disadvantages to doing it.
FWIW, I bootstrapped gcc with lto and -fpie -fno-plt:
total calls 252436
total indirect 21198 (8.4%)
via got 10128 (4.0% / 48%)
via reg 9007 (3.6% / 42%)
via data 2063 (0.8% / 10%)
Those via data are things like
callq *0x145fdc4(%rip) # 19c0ea8 <lang_hooks+0x1e8>
callq *0x14517cc(%rip) # 19c0388 <targetm+0x328>
where we have a call to a hook at a known address.
Those via reg (or complex address) are also self explanatory -- we have all
sorts of hooks and indirection inside gcc, so this is unsurprising. That said,
the very first one I examined,
000000000056735e <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334>:
...
56736f: mov 0x144f6f2(%rip),%r13 # 19b6a68 <_DYNAMIC+0x928>
...
567380: sub $0x18,%r12
567384: test %ebx,%ebx
567386: js 567394 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x36>
567388: mov 0x28(%rbp,%r12,1),%rdi
56738d: dec %ebx
56738f: callq *%r13
567392: jmp 567380 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x22>
...
does in fact hoist the address of "free" out of the loop.
Those via got can be identified by comparing the address against readelf -r to
examine the dynamic relocations. There are plenty of truly non-local calls,
e.g. to libc. These obviously cannot be relaxed.
Of those 10128 calls via the got, I found EXACTLY ONE that was local, to
_Z22const_0_to_255_operandP7rtx_def12machine_mode
from
_ZL19ix86_expand_builtinP9tree_nodeP7rtx_defS2_12machine_modei.lto_priv.2163
This is certain to be a bug, though I don't know where. There are plenty of
other calls to const_0_to_255_operand elsewhere, and they are all, as expected,
direct. This will likely take significant detective work...
r~