On Sun, Aug 16, 2015 at 12:24 PM, Alexander Monakov <amona...@ispras.ru> wrote: > On Sun, 16 Aug 2015, H.J. Lu wrote: > >> prepare_call_address in calls.c is the wrong place to handle -fno-plt. >> We shoudn't force function address into register and hope that load >> function address via GOT and indirect call via register will be folded >> into indirect call via GOT, which doesn't always happen. > > When the address load initially exists separately from the indirect call, the > load can be scheduled or be subject to loop invariant motion. What is your > reason to have them fused right from the start? > > In PR 67215, when you asked for an -O1 testcase, the reporter responded with a > case that demonstrates loop invariant motion on the call address. Why should > that be avoided? (I think it shouldn't, at least not generally)
Load it into a register avoids one load. But using a register for it is bad for x86 which has few registers. Change call foo@PLT to call *foo@GOT to avoid one extra direct branch to PLT is always good for both x86 and x86-64. >> Allso non-PIC >> case can only be handled in backend. Instead, backend should expand >> external function call into indirect call via GOT for -fno-plt. >> >> This patch reverts -fno-plt in prepare_call_address and handles it in >> ix86_expand_call. Other backends may need similar changes to support >> -fno-plt. Alternately, we can introduce a target hook to indicate >> whether an external function should be called via register for -fno-plt >> so that i386 backend can disable it in prepare_call_address. >> >> Any comments? > > Initially my patch was x86-only, but then I opted for the generic calls.c > change, expecting that it would work fine in a machine-independent manner > (which didn't work out, as ARM and AArch64 experience demonstrated). My Nor for PR 67215 on x86. > initial i386 backend patch was much smaller: > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 3263656..cd5f246 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -25577,15 +25578,23 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx > callarg1, > /* Static functions and indirect calls don't need the pic register. */ > if (flag_pic > && (!TARGET_64BIT > + || !flag_plt > || (ix86_cmodel == CM_LARGE_PIC > && DEFAULT_ABI != MS_ABI)) > && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF > && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))) > { > - use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); > - if (ix86_use_pseudo_pic_reg ()) > - emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), > - pic_offset_table_rtx); > + if (flag_plt) > + { > + use_reg (&use, gen_rtx_REG (Pmode, > REAL_PIC_OFFSET_TABLE_REGNUM)); > + if (ix86_use_pseudo_pic_reg ()) > + emit_move_insn (gen_rtx_REG (Pmode, > + REAL_PIC_OFFSET_TABLE_REGNUM), > + pic_offset_table_rtx); > + } > + else > + fnaddr = gen_rtx_MEM (QImode, > + legitimize_pic_address (XEXP (fnaddr, 0), > 0)); > } > } > > (it doesn't apply to current trunk and doesn't handle all cases your patch > handles, but at least it shows how do achieve the goal for "unfused" codegen) But the fused indirect call is what we want for x86. > Couple more comments on your patch below. > >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 4a0986c..bf8a21d 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -25650,21 +25650,50 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx >> callarg1, >> /* Static functions and indirect calls don't need the pic register. >> Also, >> check if PLT was explicitly avoided via no-plt or "noplt" attribute, >> making >> it an indirect call. */ >> + rtx addr = XEXP (fnaddr, 0); >> if (flag_pic >> - && (!TARGET_64BIT >> - || (ix86_cmodel == CM_LARGE_PIC >> - && DEFAULT_ABI != MS_ABI)) >> - && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> - && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)) >> - && flag_plt >> - && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE >> - || !lookup_attribute ("noplt", >> - DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0)))))) >> + && GET_CODE (addr) == SYMBOL_REF >> + && !SYMBOL_REF_LOCAL_P (addr)) >> { >> - use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM)); >> - if (ix86_use_pseudo_pic_reg ()) >> - emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM), >> - pic_offset_table_rtx); >> + if (flag_plt >> + && (SYMBOL_REF_DECL (addr) == NULL_TREE >> + || !lookup_attribute ("noplt", >> + DECL_ATTRIBUTES (SYMBOL_REF_DECL >> (addr))))) >> + { > > Under what circumstances can SYMBOL_REF_DECL be NULL here? For libcalls? > (I realize that your patch doesn't change the treatment; I just want to know) I don't think it can happen. But it is a separate issue and I want to limit my change to one issue. >> + if (!TARGET_64BIT >> + || (ix86_cmodel == CM_LARGE_PIC >> + && DEFAULT_ABI != MS_ABI)) >> + { >> + use_reg (&use, gen_rtx_REG (Pmode, >> + REAL_PIC_OFFSET_TABLE_REGNUM)); >> + if (ix86_use_pseudo_pic_reg ()) >> + emit_move_insn (gen_rtx_REG (Pmode, >> + REAL_PIC_OFFSET_TABLE_REGNUM), >> + pic_offset_table_rtx); >> + } >> + } >> + else if (!TARGET_PECOFF && !TARGET_MACHO) >> + { >> + if (TARGET_64BIT) >> + { >> + fnaddr = gen_rtx_UNSPEC (Pmode, >> + gen_rtvec (1, addr), >> + UNSPEC_GOTPCREL); >> + fnaddr = gen_rtx_CONST (Pmode, fnaddr); >> + } >> + else >> + { >> + fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), >> + UNSPEC_GOT); >> + fnaddr = gen_rtx_CONST (Pmode, fnaddr); >> + fnaddr = gen_rtx_PLUS (Pmode, pic_offset_table_rtx, >> + fnaddr); >> + } >> + fnaddr = gen_const_mem (Pmode, fnaddr); >> + if (GET_MODE (fnaddr) != word_mode) >> + fnaddr = gen_rtx_ZERO_EXTEND (word_mode, fnaddr); >> + fnaddr = gen_rtx_MEM (QImode, fnaddr); >> + } >> } >> } >> >> @@ -25686,9 +25715,13 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx >> callarg1, >> && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF >> && !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) >> fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); >> - else if (sibcall >> - ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) >> - : !call_insn_operand (XEXP (fnaddr, 0), word_mode)) >> + else if (!(TARGET_X32 >> + && MEM_P (fnaddr) >> + && GET_CODE (XEXP (fnaddr, 0)) == ZERO_EXTEND >> + && GOT_memory_operand (XEXP (XEXP (fnaddr, 0), 0), Pmode)) >> + && (sibcall >> + ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) >> + : !call_insn_operand (XEXP (fnaddr, 0), word_mode))) >> { >> fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1); >> fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr)); > > Perhaps add a comment that GOT slots are 64-bit on x32? > Good idea. I will update my patch. Thanks. -- H.J.