The key problem here is that EBX is not used in register allocation. If we relax the restriction on EBX the performance is back, but there are several fails. Some of them could be fixed. However I don't like that way as EBX register is uninitialized at register allocation. Initialization (SET_GOT) appeared only at: "217r.pro_and_epilogue" phase.
The key point in 2 suggestions is to set EBX register only prior to a call (as it is required by ABI). In all other cases it could be any other register. Evgeny On Mon, Jul 7, 2014 at 2:42 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jul 7, 2014 at 12:00 PM, Evgeny Stupachenko <evstu...@gmail.com> > wrote: >> Hi All, >> >> Currently GCC permanently reserves EBX as the GOT register. >> >> (config/i386/i386.c:4289) >> >> /* The PIC register, if it exists, is fixed. */ >> j = PIC_OFFSET_TABLE_REGNUM; >> if (j != INVALID_REGNUM) >> fixed_regs[j] = call_used_regs[j] = 1; >> >> This leads to significant performance losses in PIC mode: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232 >> According to my measurements ~3% generally and up to 20% in inner loops. >> >> CLANG uses all registers for allocation and therefore now has >> competitive advantage in 32bits PIC mode comparing to GCC. >> This mode is used in all Android applications and therefore is >> important for many compiler customers. >> >> There are at least 2 possible solutions. >> >> 1. >> >> While call expand emit SET_GOT -> EBX and MOV EBX -> some local register: >> LGOT >> Prior to each call emit MOV LGOT -> EBX >> Use LGOT as new GOT register for globals. >> >> 2. >> >> Set EBX as each CALL parameter. >> Emit MOV EBX->LGOT in each call. >> Use LGOT as new GOT register for globals. >> >> Do you have any comments, ideas? > > Use some LCM algorithm for placing %ebx loads, similar to > how we treat vzeroupper? > > Compute some simple IPA info on whether %ebx is provided/needed > by callers/callees? > > Richard. > >> Thanks, >> Evgeny