On 05/19/2015 06:06 PM, Rich Felker wrote:
> And are the above indirect calls/jumps (1983+43) candidates for
> scheduling/hoisting the address load (that's not being done yet), or
> are they the ones the compiler opted not to schedule/hoist? The win
> from relaxation seems small here, but as long as you're not going to
> block optimizations that would preclude relaxing, I don't see any
> disadvantages to doing it.

FWIW, I bootstrapped gcc with lto and -fpie -fno-plt:

        total calls     252436
        total indirect  21198   (8.4%)
        via got         10128   (4.0% / 48%)
        via reg         9007    (3.6% / 42%)
        via data        2063    (0.8% / 10%)

Those via data are things like

        callq  *0x145fdc4(%rip) # 19c0ea8 <lang_hooks+0x1e8>
        callq  *0x14517cc(%rip) # 19c0388 <targetm+0x328>

where we have a call to a hook at a known address.

Those via reg (or complex address) are also self explanatory -- we have all
sorts of hooks and indirection inside gcc, so this is unsurprising.  That said,
the very first one I examined,

000000000056735e <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334>:
  ...
  56736f: mov    0x144f6f2(%rip),%r13        # 19b6a68 <_DYNAMIC+0x928>
  ...
  567380: sub    $0x18,%r12
  567384: test   %ebx,%ebx
  567386: js     567394 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x36>
  567388: mov    0x28(%rbp,%r12,1),%rdi
  56738d: dec    %ebx
  56738f: callq  *%r13
  567392: jmp    567380 <_ZL15omega_free_eqnsP5eqn_di.lto_priv.3334+0x22>
  ...

does in fact hoist the address of "free" out of the loop.


Those via got can be identified by comparing the address against readelf -r to
examine the dynamic relocations.  There are plenty of truly non-local calls,
e.g. to libc.  These obviously cannot be relaxed.

Of those 10128 calls via the got, I found EXACTLY ONE that was local, to

  _Z22const_0_to_255_operandP7rtx_def12machine_mode

from

  _ZL19ix86_expand_builtinP9tree_nodeP7rtx_defS2_12machine_modei.lto_priv.2163

This is certain to be a bug, though I don't know where.  There are plenty of
other calls to const_0_to_255_operand elsewhere, and they are all, as expected,
direct.  This will likely take significant detective work...



r~

Reply via email to