on 2022/8/10 05:10, Segher Boessenkool wrote: > Hi! > > On Tue, Aug 09, 2022 at 08:51:59PM +0800, Kewen.Lin wrote: >> on 2022/8/9 18:35, Segher Boessenkool wrote: >>>> + /* As ELFv2 ABI shows, the allowable bytes past the global entry >>>> + point are 0, 4, 8, 16, 32 and 64. Considering there are two >>>> + non-prefixed instructions for global entry (8 bytes), the count >>>> + for patchable NOPs before local entry would be 2, 6 and 14. */ >>> >>> The other option is to allow other numbers of nops, but in that case not >>> have a local entry point (so, always use the global entry point). >> >> Good point, it's doable, but it means for the other counts of NOPs, the >> patched function has to pay the cost of TOC initialization all the time, >> IMHO it may not be what we want. > > It isn't very expensive: the main benefit of the LEP is not not having > to do those two insns, but having the r2 setter earlier, allowing loads > via the TOC reg to execute earlier. >
OK. >>> I don't know if that is useful for any users of this support (if there >>> even are such users :-P ) >> >> Yeah, as the discussions in PR98125, powerpc linux kernel doesn't adopt >> this feature. :-P > > Right, -mprofile-kernel is more efficient. > > So maybe just say in the comment that it is possible to support those > other nop pad sizes, by not doing a LEP at all? Instead of sasying it > cannot be done :-) OK, I'll update the comments like: /* As ELFv2 ABI shows, the allowable bytes past the global entry point are 0, 4, 8, 16, 32 and 64 when there is a local entry. Considering there are two non-prefixed instructions for global entry (8 bytes), the count for patchable NOPs before local entry would be 2, 6 and 14. It's possible to support those other counts of NOPs by not doing a local entry at all, but we don't have clear user cases for them, so leave them unsupported for now. */ > >> >>> >>>> + if (patch_area_entry > 0) >>>> + { >>>> + if (patch_area_entry != 2 >>>> + && patch_area_entry != 6 >>>> + && patch_area_entry != 14) >>>> + error ("for %<-fpatchable-function-entry=%u,%u%>, patching " >>>> + "%u NOP(s) before function entry is invalid, it can " >>>> + "cause assembler error", >>> >>> I would not say "it can [etc.]" at all. Oh, and "NOP" (capitals) isn't >>> a thing, it is not an acronym or such ;-) >>> >> >> Poor at wording. :( Could you help to suggest some words here? > > I'll try... > > "unsupported number of nops before function entry (%u)" > Nice, will update with this. >>>> +/* { dg-require-effective-target powerpc_elfv2 } */ >>>> +/* Specify -mcpu=power9 to ensure global entry is needed. */ >>>> +/* { dg-options "-mdejagnu-cpu=power9" } */ >>> >>> Why would it be needed for p9, and not older, or newer? >>> >> >> It can be p8 or p9, but not p10 and later. >> >> It's meant to exclude pc-relative feature which can make the case not >> generate a global entry point prologue and the test point will become >> unavailable. I thought about adding -mno-pcrel, but guessed it's safer >> to use one cpu type which doesn't support pcrel at all, since it can >> exclude all possibilities that pcrel gets re-enabled. >> >> Do you think -mno-pcrel is more elegant and relatively safe? >> Or just update the comments to make it more meaningful? > > Just use { ! powerpc_pcrel } ? I don't think you can put that in a > dg-require-effective-target, but you can do for example > dg-do compile { target { ! powerpc_pcrel } } > or similar. > Good idea, I'll send out one new version of patch after some testings. BR, Kewen