on 2022/8/10 05:10, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 09, 2022 at 08:51:59PM +0800, Kewen.Lin wrote:
>> on 2022/8/9 18:35, Segher Boessenkool wrote:
>>>> +    /* As ELFv2 ABI shows, the allowable bytes past the global entry
>>>> +       point are 0, 4, 8, 16, 32 and 64.  Considering there are two
>>>> +       non-prefixed instructions for global entry (8 bytes), the count
>>>> +       for patchable NOPs before local entry would be 2, 6 and 14.  */
>>>
>>> The other option is to allow other numbers of nops, but in that case not
>>> have a local entry point (so, always use the global entry point).
>>
>> Good point, it's doable, but it means for the other counts of NOPs, the
>> patched function has to pay the cost of TOC initialization all the time,
>> IMHO it may not be what we want.
> 
> It isn't very expensive: the main benefit of the LEP is not not having
> to do those two insns, but having the r2 setter earlier, allowing loads
> via the TOC reg to execute earlier.
> 

OK.

>>> I don't know if that is useful for any users of this support (if there
>>> even are such users :-P )
>>
>> Yeah, as the discussions in PR98125, powerpc linux kernel doesn't adopt
>> this feature.  :-P
> 
> Right, -mprofile-kernel is more efficient.
> 
> So maybe just say in the comment that it is possible to support those
> other nop pad sizes, by not doing a LEP at all?  Instead of sasying it
> cannot be done :-)

OK, I'll update the comments like:

/* As ELFv2 ABI shows, the allowable bytes past the global entry
   point are 0, 4, 8, 16, 32 and 64 when there is a local entry.
   Considering there are two non-prefixed instructions for global
   entry (8 bytes), the count for patchable NOPs before local entry
   would be 2, 6 and 14.  It's possible to support those other
   counts of NOPs by not doing a local entry at all, but we don't
   have clear user cases for them, so leave them unsupported for
   now.  */

> 
>>
>>>
>>>> +    if (patch_area_entry > 0)
>>>> +      {
>>>> +        if (patch_area_entry != 2
>>>> +            && patch_area_entry != 6
>>>> +            && patch_area_entry != 14)
>>>> +          error ("for %<-fpatchable-function-entry=%u,%u%>, patching "
>>>> +                 "%u NOP(s) before function entry is invalid, it can "
>>>> +                 "cause assembler error",
>>>
>>> I would not say "it can [etc.]" at all.  Oh, and "NOP" (capitals) isn't
>>> a thing, it is not an acronym or such ;-)
>>>
>>
>> Poor at wording.  :(  Could you help to suggest some words here? 
> 
> I'll try...
> 
> "unsupported number of nops before function entry (%u)"
> 

Nice, will update with this.

>>>> +/* { dg-require-effective-target powerpc_elfv2 } */
>>>> +/* Specify -mcpu=power9 to ensure global entry is needed.  */
>>>> +/* { dg-options "-mdejagnu-cpu=power9" } */
>>>
>>> Why would it be needed for p9, and not older, or newer?
>>>
>>
>> It can be p8 or p9, but not p10 and later.  
>>
>> It's meant to exclude pc-relative feature which can make the case not
>> generate a global entry point prologue and the test point will become
>> unavailable.  I thought about adding -mno-pcrel, but guessed it's safer
>> to use one cpu type which doesn't support pcrel at all, since it can
>> exclude all possibilities that pcrel gets re-enabled.
>>
>> Do you think -mno-pcrel is more elegant and relatively safe?
>> Or just update the comments to make it more meaningful?
> 
> Just use { ! powerpc_pcrel } ?  I don't think you can put that in a
> dg-require-effective-target, but you can do for example
>   dg-do compile { target { ! powerpc_pcrel } }
> or similar.
> 

Good idea, I'll send out one new version of patch after some testings.

BR,
Kewen

Reply via email to