Re: [PATCH v6 6/8] x86/module: prepare module loading for ROX allocations of text

2024-10-17 Thread Peter Zijlstra
On Wed, Oct 16, 2024 at 05:01:28PM -0400, Steven Rostedt wrote:
> On Wed, 16 Oct 2024 15:24:22 +0300
> Mike Rapoport  wrote:
> 
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 8da0e66ca22d..b498897b213c 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -118,10 +118,13 @@ ftrace_modify_code_direct(unsigned long ip, const 
> > char *old_code,
> > return ret;
> >  
> > /* replace the text with the new text */
> > -   if (ftrace_poke_late)
> > +   if (ftrace_poke_late) {
> > text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
> > -   else
> > -   text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > +   } else {
> > +   mutex_lock(&text_mutex);
> > +   text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > +   mutex_unlock(&text_mutex);
> > +   }
> > return 0;
> >  }
> 
> So this slows down the boot by over 30ms. That may not sound like much, but
> we care very much about boot times. This code is serialized with boot and
> runs whenever ftrace is configured in the kernel. The way I measured this,
> was that I added:
> 

> If this is only needed for module load, can we at least still use the
> text_poke_early() at boot up?

Right, so I don't understand why this is needed at all.
ftrace_module_init() runs before complete_formation() which normally
switches to ROX, as such ftrace should be able to continue to do direct
modifications here.

Which reminds me, at some point I did patches adding a
MODULE_STATE_UNFORMED callback in order for static_call / jump_label to
be able to avoid the expensive patching on module load as well (arguably
ftrace should be using that too, instead of a custom callback).

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v6 6/8] x86/module: prepare module loading for ROX allocations of text

2024-10-17 Thread Mike Rapoport
On Thu, Oct 17, 2024 at 11:35:15AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 16, 2024 at 05:01:28PM -0400, Steven Rostedt wrote:
> > On Wed, 16 Oct 2024 15:24:22 +0300
> > Mike Rapoport  wrote:
> > 
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 8da0e66ca22d..b498897b213c 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -118,10 +118,13 @@ ftrace_modify_code_direct(unsigned long ip, const 
> > > char *old_code,
> > >   return ret;
> > >  
> > >   /* replace the text with the new text */
> > > - if (ftrace_poke_late)
> > > + if (ftrace_poke_late) {
> > >   text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
> > > - else
> > > - text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > > + } else {
> > > + mutex_lock(&text_mutex);
> > > + text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > > + mutex_unlock(&text_mutex);
> > > + }
> > >   return 0;
> > >  }
> > 
> > So this slows down the boot by over 30ms. That may not sound like much, but
> > we care very much about boot times. This code is serialized with boot and
> > runs whenever ftrace is configured in the kernel. The way I measured this,
> > was that I added:
> > 
> 
> > If this is only needed for module load, can we at least still use the
> > text_poke_early() at boot up?
> 
> Right, so I don't understand why this is needed at all.
> ftrace_module_init() runs before complete_formation() which normally
> switches to ROX, as such ftrace should be able to continue to do direct
> modifications here.

With this series the module text is allocated as ROX at the first place, so
the modifications ftrace does to module text have to either use text poking
even before complete_formation() or deal with a writable copy like I did
for relocations and alternatives.

I've been carrying the ftrace changes from a very old prototype and
didn't pay enough attention to them them until Steve's complaint.

I'll look into it.
 
> Which reminds me, at some point I did patches adding a
> MODULE_STATE_UNFORMED callback in order for static_call / jump_label to
> be able to avoid the expensive patching on module load as well (arguably
> ftrace should be using that too, instead of a custom callback).
> 

-- 
Sincerely yours,
Mike.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v6 6/8] x86/module: prepare module loading for ROX allocations of text

2024-10-17 Thread Steven Rostedt
On Thu, 17 Oct 2024 14:25:05 +0300
Mike Rapoport  wrote:

> With this series the module text is allocated as ROX at the first place, so
> the modifications ftrace does to module text have to either use text poking
> even before complete_formation() or deal with a writable copy like I did
> for relocations and alternatives.
> 
> I've been carrying the ftrace changes from a very old prototype and
> didn't pay enough attention to them them until Steve's complaint.
> 
> I'll look into it.

I just posted a patch where you can see the effects of these changes with
respect to ftrace patching times.

  https://lore.kernel.org/all/20241017113105.1edfa...@gandalf.local.home/

I'll be adding this to the next merge window.

-- Steve

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v6 6/8] x86/module: prepare module loading for ROX allocations of text

2024-10-17 Thread Steven Rostedt
On Wed, 16 Oct 2024 17:01:28 -0400
Steven Rostedt  wrote:

> If this is only needed for module load, can we at least still use the
> text_poke_early() at boot up?
> 
>   if (ftrace_poke_late) {
>   text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
>   } else if (system_state == SYSTEM_BOOTING) {
>   text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
>   } else {
>   mutex_lock(&text_mutex);
>   text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
>   mutex_unlock(&text_mutex);
>   }
> 
> ?
> 
> The above if statement looks to slow things down just slightly, but only by
> 2ms, which is more reasonable.

I changed the above to this (yes it's a little hacky) and got my 2ms back!

-- Steve

DEFINE_STATIC_KEY_TRUE(ftrace_modify_boot);

static int __init ftrace_boot_init_done(void)
{
static_branch_disable(&ftrace_modify_boot);
return 0;
}
/* Ftrace updates happen before core init */
core_initcall(ftrace_boot_init_done);

/*
 * Marked __ref because it calls text_poke_early() which is .init.text. That is
 * ok because that call will happen early, during boot, when .init sections are
 * still present.
 */
static int __ref
ftrace_modify_code_direct(unsigned long ip, const char *old_code,
  const char *new_code)
{
int ret = ftrace_verify_code(ip, old_code);

if (ret)
return ret;

/* replace the text with the new text */
if (static_branch_unlikely(&ftrace_modify_boot)) {
text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
} else if (ftrace_poke_late) {
text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
} else {
mutex_lock(&text_mutex);
text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
mutex_unlock(&text_mutex);
}
return 0;
}

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc