On 20/04/16 02:25, AKASHI Takahiro wrote: > On Tue, Apr 19, 2016 at 09:39:39AM +0300, Alexander Monakov wrote: >> On Tue, 19 Apr 2016, AKASHI Takahiro wrote: >>>>> But if Szabolcs' two-instruction >>>>> sequence in the adjacent subthread is sufficient, this is moot. >>>> >>>> . It can also be solved by having just one NOP after the function label, >>>> and a number of them before, then no thread can be in the nop pad. That >>>> seems to indicate that GCC should not try to be too clever and simply >>>> leave the specified number of nops before and after the function label, >>>> leaving safety measures to the patching infrastructure. >>> >>> I don't get this idea very well. >>> How can the instructions *before* a function label be executed >>> after branching into this function? >> >> The single nop after the function label is changed to a short backwards >> branch >> to the instructions just before the function label. >> >> As a result, the last instruction in the pad would have to become a short >> forward branch jumping over the backwards branch described above, to the >> first >> real instruction of the function. > > So you mean something like: > 1: > str x30, [sp, #-8]! > bl _tracefunc > ldr x30, [sp], #8 > b 2f > .global <function label> > b 1b > 2: > <function prologue/body> > ... > (We will not have to use x9 or else to preserve x30 here.) > > Interesting. > Livepatch code in the kernel has an assumption that the address of > "bl _tracefunc" be equal to <function label>, but a recent patch for > power pc to support livepatch tries to ease this restriction [1], > and so hopefully it won't be an issue. > (I will have to dig into the kernel code to be sure that there is > no other issues though.) >
i think ldr x30,[sp],#8 after the _tracefunc is not ok for livepatching, since _tracefunc will change the return address to the new function to hijack the call, which will not restore the stack (this can be solved if the new function can be instrumented, but fiddly). and sp has to be 16 byte aligned, so the options are str x30,[sp,#-16]! bl _tracefunc or mov x9,x30 bl _tracefunc where _tracefunc is responsible for restoring x30 and sp, and this sequence can come before or after the function symbol. if it's before then 1: <save x30> bl _tracefunc b 2f func: b 1b 2: <prologue> the trace disabled case is better (only one nop), but i think it would mean more kernel work (the current code assumes bl _tracefunc is nopped, so whenever tracing is enabled a different tracefunc target may be used in the atomic update, i don't know if this is necessary though). it is probably only worth inventing something new for aarch64 in gcc if the kernel can use that consistently across targets or if that can cover other significant use cases, but it's not clear if the various flexible nop padding solutions can be more useful than the simple two instruction sequence which kernel tools can already deal with. so it seems to me that func: mov x9, x30 bl __fentry__ <prologue> is still the best option with a new -mfentry option for aarch64 (then we can keep the default -pg behaviour for backward compatibility and work similarly to x86 with -mfentry) it does not solve the more general instrumentation problem, but that would require more analysis. (on x86, gcc also provides -mrecord-mcount and -mnop-mcount to record the noped out mcount call sites, but the kernel seems to use its own tool to do that by looking for the mcount/fentry call relocs so they are probably not needed). > Thanks, > -Takahiro AKASHI > > [1] http://lkml.iu.edu//hypermail/linux/kernel/1604.1/04111.html and > http://lkml.iu.edu//hypermail/linux/kernel/1604.1/04112.html > >> Alexander >