On Sun, Aug 16, 2015 at 12:24 PM, Alexander Monakov <amona...@ispras.ru> wrote:
> On Sun, 16 Aug 2015, H.J. Lu wrote:
>
>> prepare_call_address in calls.c is the wrong place to handle -fno-plt.
>> We shoudn't force function address into register and hope that load
>> function address via GOT and indirect call via register will be folded
>> into indirect call via GOT, which doesn't always happen.
>
> When the address load initially exists separately from the indirect call, the
> load can be scheduled or be subject to loop invariant motion.  What is your
> reason to have them fused right from the start?
>
> In PR 67215, when you asked for an -O1 testcase, the reporter responded with a
> case that demonstrates loop invariant motion on the call address.  Why should
> that be avoided?  (I think it shouldn't, at least not generally)

Load it into a register avoids one load.  But using a register for it
is bad for x86 which has few registers.  Change

call foo@PLT

to

call *foo@GOT

to avoid one extra direct branch to PLT is always good for both x86
and x86-64.

>> Allso non-PIC
>> case can only be handled in backend.  Instead, backend should expand
>> external function call into indirect call via GOT for -fno-plt.
>>
>> This patch reverts -fno-plt in prepare_call_address and handles it in
>> ix86_expand_call.  Other backends may need similar changes to support
>> -fno-plt.  Alternately, we can introduce a target hook to indicate
>> whether an external function should be called via register for -fno-plt
>> so that i386 backend can disable it in prepare_call_address.
>>
>> Any comments?
>
> Initially my patch was x86-only, but then I opted for the generic calls.c
> change, expecting that it would work fine in a machine-independent manner
> (which didn't work out, as ARM and AArch64 experience demonstrated).  My

Nor for PR 67215 on x86.

> initial i386 backend patch was much smaller:
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 3263656..cd5f246 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -25577,15 +25578,23 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> callarg1,
>        /* Static functions and indirect calls don't need the pic register.  */
>        if (flag_pic
>           && (!TARGET_64BIT
> +             || !flag_plt
>               || (ix86_cmodel == CM_LARGE_PIC
>                   && DEFAULT_ABI != MS_ABI))
>           && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>           && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>         {
> -         use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
> -         if (ix86_use_pseudo_pic_reg ())
> -           emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
> -                           pic_offset_table_rtx);
> +         if (flag_plt)
> +           {
> +             use_reg (&use, gen_rtx_REG (Pmode, 
> REAL_PIC_OFFSET_TABLE_REGNUM));
> +             if (ix86_use_pseudo_pic_reg ())
> +               emit_move_insn (gen_rtx_REG (Pmode,
> +                                            REAL_PIC_OFFSET_TABLE_REGNUM),
> +                               pic_offset_table_rtx);
> +           }
> +         else
> +           fnaddr = gen_rtx_MEM (QImode,
> +                                 legitimize_pic_address (XEXP (fnaddr, 0), 
> 0));
>         }
>      }
>
> (it doesn't apply to current trunk and doesn't handle all cases your patch
> handles, but at least it shows how do achieve the goal for "unfused" codegen)

But the fused indirect call is what we want for x86.

> Couple more comments on your patch below.
>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 4a0986c..bf8a21d 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -25650,21 +25650,50 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
>> callarg1,
>>        /* Static functions and indirect calls don't need the pic register.  
>> Also,
>>        check if PLT was explicitly avoided via no-plt or "noplt" attribute, 
>> making
>>        it an indirect call.  */
>> +      rtx addr = XEXP (fnaddr, 0);
>>        if (flag_pic
>> -       && (!TARGET_64BIT
>> -           || (ix86_cmodel == CM_LARGE_PIC
>> -               && DEFAULT_ABI != MS_ABI))
>> -       && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>> -       && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
>> -       && flag_plt
>> -       && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE
>> -           || !lookup_attribute ("noplt",
>> -                  DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0))))))
>> +       && GET_CODE (addr) == SYMBOL_REF
>> +       && !SYMBOL_REF_LOCAL_P (addr))
>>       {
>> -       use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
>> -       if (ix86_use_pseudo_pic_reg ())
>> -         emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
>> -                         pic_offset_table_rtx);
>> +       if (flag_plt
>> +           && (SYMBOL_REF_DECL (addr) == NULL_TREE
>> +               || !lookup_attribute ("noplt",
>> +                                     DECL_ATTRIBUTES (SYMBOL_REF_DECL 
>> (addr)))))
>> +         {
>
> Under what circumstances can SYMBOL_REF_DECL be NULL here?  For libcalls?
> (I realize that your patch doesn't change the treatment; I just want to know)

I don't think it can happen.  But it is a separate issue and I want to
limit my change to one issue.

>> +           if (!TARGET_64BIT
>> +               || (ix86_cmodel == CM_LARGE_PIC
>> +                   && DEFAULT_ABI != MS_ABI))
>> +             {
>> +               use_reg (&use, gen_rtx_REG (Pmode,
>> +                                           REAL_PIC_OFFSET_TABLE_REGNUM));
>> +               if (ix86_use_pseudo_pic_reg ())
>> +                 emit_move_insn (gen_rtx_REG (Pmode,
>> +                                              REAL_PIC_OFFSET_TABLE_REGNUM),
>> +                                 pic_offset_table_rtx);
>> +             }
>> +         }
>> +       else if (!TARGET_PECOFF && !TARGET_MACHO)
>> +         {
>> +           if (TARGET_64BIT)
>> +             {
>> +               fnaddr = gen_rtx_UNSPEC (Pmode,
>> +                                        gen_rtvec (1, addr),
>> +                                        UNSPEC_GOTPCREL);
>> +               fnaddr = gen_rtx_CONST (Pmode, fnaddr);
>> +             }
>> +           else
>> +             {
>> +               fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
>> +                                        UNSPEC_GOT);
>> +               fnaddr = gen_rtx_CONST (Pmode, fnaddr);
>> +               fnaddr = gen_rtx_PLUS (Pmode, pic_offset_table_rtx,
>> +                                      fnaddr);
>> +             }
>> +           fnaddr = gen_const_mem (Pmode, fnaddr);
>> +           if (GET_MODE (fnaddr) != word_mode)
>> +             fnaddr = gen_rtx_ZERO_EXTEND (word_mode, fnaddr);
>> +           fnaddr = gen_rtx_MEM (QImode, fnaddr);
>> +         }
>>       }
>>      }
>>
>> @@ -25686,9 +25715,13 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
>> callarg1,
>>        && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>        && !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode))
>>      fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0)));
>> -  else if (sibcall
>> -        ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
>> -        : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
>> +  else if (!(TARGET_X32
>> +          && MEM_P (fnaddr)
>> +          && GET_CODE (XEXP (fnaddr, 0)) == ZERO_EXTEND
>> +          && GOT_memory_operand (XEXP (XEXP (fnaddr, 0), 0), Pmode))
>> +        && (sibcall
>> +            ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
>> +            : !call_insn_operand (XEXP (fnaddr, 0), word_mode)))
>>      {
>>        fnaddr = convert_to_mode (word_mode, XEXP (fnaddr, 0), 1);
>>        fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr));
>
> Perhaps add a comment that GOT slots are 64-bit on x32?
>

Good idea.  I will update my patch.

Thanks.

-- 
H.J.

Reply via email to