On 05/19/2015 06:06 PM, Rich Felker wrote: > And are the above indirect calls/jumps (1983+43) candidates for > scheduling/hoisting the address load (that's not being done yet), or > are they the ones the compiler opted not to schedule/hoist? The win > from relaxation seems small here, but as long as you're not going to > block optimizations that would preclude relaxing, I don't see any > disadvantages to doing it.
FWIW, I bootstrapped gcc with lto and -fpie -fno-plt: total calls 252436 total indirect 21198 (8.4%) via got 10128 (4.0% / 48%) via reg 9007 (3.6% / 42%) via data 2063 (0.8% / 10%) Those via data are things like callq *0x145fdc4(%rip) # 19c0ea8 <lang_hooks+0x1e8> callq *0x14517cc(%rip) # 19c0388 <targetm+0x328> where we have a call to a hook at a known address. Those via reg (or complex address) are also self explanatory -- we have all sorts of hooks and indirection inside gcc, so this is unsurprising. That said, the very first one I examined, 000000000056735e <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334>: ... 56736f: mov 0x144f6f2(%rip),%r13 # 19b6a68 <_DYNAMIC+0x928> ... 567380: sub $0x18,%r12 567384: test %ebx,%ebx 567386: js 567394 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x36> 567388: mov 0x28(%rbp,%r12,1),%rdi 56738d: dec %ebx 56738f: callq *%r13 567392: jmp 567380 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x22> ... does in fact hoist the address of "free" out of the loop. Those via got can be identified by comparing the address against readelf -r to examine the dynamic relocations. There are plenty of truly non-local calls, e.g. to libc. These obviously cannot be relaxed. Of those 10128 calls via the got, I found EXACTLY ONE that was local, to _Z22const_0_to_255_operandP7rtx_def12machine_mode from _ZL19ix86_expand_builtinP9tree_nodeP7rtx_defS2_12machine_modei.lto_priv.2163 This is certain to be a bug, though I don't know where. There are plenty of other calls to const_0_to_255_operand elsewhere, and they are all, as expected, direct. This will likely take significant detective work... r~