Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API

2025-02-19 Thread Borislav Petkov
On Fri, Jan 10, 2025 at 06:40:29PM +, Brendan Jackman wrote:
> Subject: Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API

x86/asi: ...

> Introduce core API for Address Space Isolation (ASI).  Kernel address
> space isolation provides the ability to run some kernel
> code with a restricted kernel address space.
> 
> There can be multiple classes of such restricted kernel address spaces
> (e.g. KPTI, KVM-PTI etc.). Each ASI class is identified by an index.
> The ASI class can register some hooks to be called when
> entering/exiting the restricted address space.
> 
> Currently, there is a fixed maximum number of ASI classes supported.
> In addition, each process can have at most one restricted address space
> from each ASI class. Neither of these are inherent limitations and
> are merely simplifying assumptions for the time being.
> 
> To keep things simpler for the time being, we disallow context switches

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

Also, see section "Changelog" in
Documentation/process/maintainer-tip.rst

> within the restricted address space. In the future, we would be able to
> relax this limitation for the case of context switches to different
> threads within the same process (or to the idle thread and back).
> 
> Note that this doesn't really support protecting sibling VM guests
> within the same VMM process from one another. From first principles
> it seems unlikely that anyone who cares about VM isolation would do
> that, but there could be a use-case to think about. In that case need
> something like the OTHER_MM logic might be needed, but specific to
> intra-process guest separation.
> 
> [0]:
> https://lore.kernel.org/kvm/1562855138-19507-1-git-send-email-alexandre.char...@oracle.com
> 
> Notes about RFC-quality implementation details:
> 
>  - Ignoring checkpatch.pl AVOID_BUG.
>  - The dynamic registration of classes might be pointless complexity.
>This was kept from RFCv1 without much thought.
>  - The other-mm logic is also perhaps overly complex, suggestions are
>welcome for how best to tackle this (or we could just forget about
>it for the moment, and rely on asi_exit() happening in process
>switch).
>  - The taint flag definitions would probably be clearer with an enum or
>something.
> 
> Checkpatch-args: --ignore=AVOID_BUG,COMMIT_LOG_LONG_LINE,EXPORT_SYMBOL
> Co-developed-by: Ofir Weisse 
> Signed-off-by: Ofir Weisse 
> Co-developed-by: Junaid Shahid 
> Signed-off-by: Junaid Shahid 
> Signed-off-by: Brendan Jackman 
> ---
>  arch/x86/include/asm/asi.h   | 208 +++
>  arch/x86/include/asm/processor.h |   8 +
>  arch/x86/mm/Makefile |   1 +
>  arch/x86/mm/asi.c| 350 
> +++
>  arch/x86/mm/init.c   |   3 +-
>  arch/x86/mm/tlb.c|   1 +
>  include/asm-generic/asi.h|  67 
>  include/linux/mm_types.h |   7 +
>  kernel/fork.c|   3 +
>  kernel/sched/core.c  |   9 +
>  mm/init-mm.c |   4 +
>  11 files changed, 660 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h
> new file mode 100644
> index 
> ..7cc635b6653a3970ba9dbfdc9c828a470e27bd44
> --- /dev/null
> +++ b/arch/x86/include/asm/asi.h
> @@ -0,0 +1,208 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_ASI_H
> +#define _ASM_X86_ASI_H
> +
> +#include 
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
> +
> +/*
> + * Overview of API usage by ASI clients:
> + *
> + * Setup: First call asi_init() to create a domain. At present only one 
> domain
> + * can be created per mm per class, but it's safe to asi_init() this domain
> + * multiple times. For each asi_init() call you must call asi_destroy() AFTER
> + * you are certain all CPUs have exited the restricted address space (by
> + * calling asi_exit()).
> + *
> + * Runtime usage:
> + *
> + * 1. Call asi_enter() to switch to the restricted address space. This can't 
> be
> + *from an interrupt or exception handler and preemption must be disabled.
> + *
> + * 2. Execute untrusted code.
> + *
> + * 3. Call asi_relax() to inform the ASI subsystem that untrusted code 
> execution
> + *is finished. This doesn't cause any address space change. This can't be
> + *from an interrupt or exception handler and preemption must be disabled.
> + *
> + * 4. Either:
> + *
> + *a. Go back to 1.
> + *
> + *b. Call asi_exit() before returning to userspace. This immediately
> + *   switches to the unrestricted address space.

So only from reading this, it does sound weird. Maybe the code does it
differently - I'll see soon - but this basically says:

I asi_enter(), do something, asi_relax() and then I decide to do something

Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API

2025-02-19 Thread Brendan Jackman
Argh, sorry, GMail switched back to HTML mode somehow. Maybe I have to
get a proper mail client after all.

Here's the clean version.

On Wed, 19 Feb 2025 at 11:57, Borislav Petkov  wrote:
>
> > + * Runtime usage:
> > + *
> > + * 1. Call asi_enter() to switch to the restricted address space. This 
> > can't be
> > + *from an interrupt or exception handler and preemption must be 
> > disabled.
> > + *
> > + * 2. Execute untrusted code.
> > + *
> > + * 3. Call asi_relax() to inform the ASI subsystem that untrusted code 
> > execution
> > + *is finished. This doesn't cause any address space change. This can't 
> > be
> > + *from an interrupt or exception handler and preemption must be 
> > disabled.
> > + *
> > + * 4. Either:
> > + *
> > + *a. Go back to 1.
> > + *
> > + *b. Call asi_exit() before returning to userspace. This immediately
> > + *   switches to the unrestricted address space.
>
> So only from reading this, it does sound weird. Maybe the code does it
> differently - I'll see soon - but this basically says:
>
> I asi_enter(), do something, asi_relax() and then I decide to do something
> more and to asi_enter() again!? And then I can end it all with a *single*
> asi_exit() call?
>
> Hm, definitely weird API. Why?


OK, sounds like I need to rewrite this explanation! It's only been
read before by people who already knew how this thing worked so this
might take a few attempts to make it clear.

Maybe the best way to make it clear is to explain this with reference
to KVM. At a super high level, That looks like:

ioctl(KVM_RUN) {
enter_from_user_mode()
while !need_userspace_handling() {
asi_enter();  // part 1
vmenter();  // part 2
asi_relax(); // part 3
}
asi _exit(); // part 4b
exit_to_user_mode()
}

So part 4a is just referring to continuation of the loop.

This explanation was written when that was the only user of this API
so it was probably clearer, now we have userspace it seems a bit odd.

With my pseudocode above, does it make more sense? If so I'll try to
think of a better way to explain it.

> /*
>  * Leave the "tense" state if we are in it, i.e. end the critical section. We
>  * will stay relaxed until the next asi_enter.
>  */
> void asi_relax(void);
>
> Yeah, so there's no API functions balance between enter() and relax()...


asi_enter() is actually balanced with asi_relax(). The comment says
"if we are in it" because technically if you call this asi_relax()
outside of the critical section, it's a nop. But, there's no reason to
do that, so we could definitely change the comment and WARN if that
happens.

>
> > +#define ASI_TAINT_OTHER_MM_CONTROL   ((asi_taints_t)BIT(6))
> > +#define ASI_NUM_TAINTS   6
> > +static_assert(BITS_PER_BYTE * sizeof(asi_taints_t) >= ASI_NUM_TAINTS);
>
> Why is this a typedef at all to make the code more unreadable than it needs to
> be? Why not a simple unsigned int or char or whatever you need?


My thinking was just that it's nicer to see asi_taints_t and know that
it means "it holds taint flags and it's big enough" instead of having
to remember the space needed for these flags. But yeah I'm fine with
making it a raw integer type.

> > +#define ASI_TAINTS_CONTROL_MASK \
> > + (ASI_TAINT_USER_CONTROL | ASI_TAINT_GUEST_CONTROL | 
> > ASI_TAINT_OTHER_MM_CONTROL)
> > +
> > +#define ASI_TAINTS_DATA_MASK \
> > + (ASI_TAINT_KERNEL_DATA | ASI_TAINT_USER_DATA | 
> > ASI_TAINT_OTHER_MM_DATA)
> > +
> > +#define ASI_TAINTS_GUEST_MASK (ASI_TAINT_GUEST_DATA | 
> > ASI_TAINT_GUEST_CONTROL)
> > +#define ASI_TAINTS_USER_MASK (ASI_TAINT_USER_DATA | ASI_TAINT_USER_CONTROL)
> > +
> > +/* The taint policy tells ASI how a class interacts with the CPU taints */
> > +struct asi_taint_policy {
> > + /*
> > +  * What taints would necessitate a flush when entering the domain, to
> > +  * protect it from attack by prior domains?
> > +  */
> > + asi_taints_t prevent_control;
>
> So if those necessitate a flush, why isn't this var called "taints_to_flush"
> or whatever which exactly explains what it is?


Well it needs to be disambiguated from the field below (currently
protect_data) but it could be control_to_flush (and data_to_flush).

The downside of that is that having one say "prevent" and one say
"protect" is quite meaningful. prevent_control is describing things we
need to do to protect the system from this domain, protect_data is
about protecting the domain from the system. However, while that
difference is meaningful it might not actually be helpful for the
reader of the code so I'm not wed to it.

Also worth noting that we could just combine these fields. At present
they should have disjoint bits set. But, they're used in separate
contexts and have separate (although conceptually very similar)
meanings, so I think that would reduce clarity.

>
> Spellchecker please. Go over your whole set.


Ack, I've set up a local thingy to spellcheck all my commits so
hopefully

Re: [PATCH] mm/ioremap: Pass pgprot_t to ioremap_prot() instead of unsigned long

2025-02-19 Thread Catalin Marinas
On Tue, Feb 18, 2025 at 03:49:54PM +0530, Anshuman Khandual wrote:
> From: Ryan Roberts 
> 
> ioremap_prot() currently accepts pgprot_val parameter as an unsigned long,
> thus implicitly assuming that pgprot_val and pgprot_t could never be bigger
> than unsigned long. But this assumption soon will not be true on arm64 when
> using D128 pgtables. In 128 bit page table configuration, unsigned long is
> 64 bit, but pgprot_t is 128 bit.
> 
> Passing platform abstracted pgprot_t argument is better as compared to size
> based data types. Let's change the parameter to directly pass pgprot_t like
> another similar helper generic_ioremap_prot().
> 
> Without this change in place, D128 configuration does not work on arm64 as
> the top 64 bits gets silently stripped when passing the protection value to
> this function.
> 
> Cc: Andrew Morton 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-par...@vger.kernel.org
> Cc: linux-c...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux-a...@vger.kernel.org
> Cc: loonga...@lists.linux.dev
> Cc: linux...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Ryan Roberts 
> Co-developed-by: Anshuman Khandual 
> Signed-off-by: Anshuman Khandual 

For arm64:

Acked-by: Catalin Marinas 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 1/1] mm: pgtable: fix pte_swp_exclusive

2025-02-19 Thread Sam James
Lovely cleanup and a great suggestion from Al.

Reviewed-by: Sam James 

I'd suggest adding a:
Suggested-by: Al Viro 

thanks,
sam

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v6 2/6] syscall.h: add syscall_set_arguments()

2025-02-19 Thread Maciej W. Rozycki
On Mon, 17 Feb 2025, Dmitry V. Levin wrote:

> This function is going to be needed on all HAVE_ARCH_TRACEHOOK
> architectures to implement PTRACE_SET_SYSCALL_INFO API.
> 
> This partially reverts commit 7962c2eddbfe ("arch: remove unused
> function syscall_set_arguments()") by reusing some of old
> syscall_set_arguments() implementations.
> 
> Signed-off-by: Dmitry V. Levin 
> Tested-by: Charlie Jenkins 
> Reviewed-by: Charlie Jenkins 
> Acked-by: Helge Deller  # parisc
> ---
>  arch/arc/include/asm/syscall.h| 14 +++
>  arch/arm/include/asm/syscall.h| 13 ++
>  arch/arm64/include/asm/syscall.h  | 13 ++
>  arch/csky/include/asm/syscall.h   | 13 ++
>  arch/hexagon/include/asm/syscall.h|  7 ++
>  arch/loongarch/include/asm/syscall.h  |  8 ++
>  arch/mips/include/asm/syscall.h   | 32 
>  arch/nios2/include/asm/syscall.h  | 11 
>  arch/openrisc/include/asm/syscall.h   |  7 ++
>  arch/parisc/include/asm/syscall.h | 12 +
>  arch/powerpc/include/asm/syscall.h| 10 
>  arch/riscv/include/asm/syscall.h  |  9 +++
>  arch/s390/include/asm/syscall.h   |  9 +++
>  arch/sh/include/asm/syscall_32.h  | 12 +
>  arch/sparc/include/asm/syscall.h  | 10 
>  arch/um/include/asm/syscall-generic.h | 14 +++
>  arch/x86/include/asm/syscall.h| 36 +++
>  arch/xtensa/include/asm/syscall.h | 11 
>  include/asm-generic/syscall.h | 16 
>  19 files changed, 257 insertions(+)
> 
> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
> index 9709256e31c8..89c1e1736356 100644
> --- a/arch/arc/include/asm/syscall.h
> +++ b/arch/arc/include/asm/syscall.h
> @@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct 
> pt_regs *regs,
>   }
>  }
>  
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> +   unsigned long *args)
> +{
> + unsigned long *inside_ptregs = ®s->r0;
> + unsigned int n = 6;
> + unsigned int i = 0;
> +
> + while (n--) {
> + *inside_ptregs = args[i++];
> + inside_ptregs--;
> + }
> +}
> +
>  static inline int
>  syscall_get_arch(struct task_struct *task)
>  {
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index fe4326d938c1..21927fa0ae2b 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct 
> task_struct *task,
>   memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
>  }
>  
> +static inline void syscall_set_arguments(struct task_struct *task,
> +  struct pt_regs *regs,
> +  const unsigned long *args)
> +{
> + memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
> + /*
> +  * Also copy the first argument into ARM_ORIG_r0
> +  * so that syscall_get_arguments() would return it
> +  * instead of the previous value.
> +  */
> + regs->ARM_ORIG_r0 = regs->ARM_r0;
> +}
> +
>  static inline int syscall_get_arch(struct task_struct *task)
>  {
>   /* ARM tasks don't change audit architectures on the fly. */
> diff --git a/arch/arm64/include/asm/syscall.h 
> b/arch/arm64/include/asm/syscall.h
> index ab8e14b96f68..76020b66286b 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct 
> task_struct *task,
>   memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
>  }
>  
> +static inline void syscall_set_arguments(struct task_struct *task,
> +  struct pt_regs *regs,
> +  const unsigned long *args)
> +{
> + memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
> + /*
> +  * Also copy the first argument into orig_x0
> +  * so that syscall_get_arguments() would return it
> +  * instead of the previous value.
> +  */
> + regs->orig_x0 = regs->regs[0];
> +}
> +
>  /*
>   * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
>   * AArch64 has the same system calls both on little- and big- endian.
> diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
> index 0de5734950bf..30403f7a0487 100644
> --- a/arch/csky/include/asm/syscall.h
> +++ b/arch/csky/include/asm/syscall.h
> @@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct 
> pt_regs *regs,
>   memcpy(args, ®s->a1, 5 * sizeof(args[0]));
>  }
>  
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> +   const unsigned long *args)
> +{
> + memcpy(®s->a0, args, 6 * sizeof(regs->a0));
> + /*
> +  * Also copy the first argument into orig_x0
   

Re: [PATCH v6 3/6] syscall.h: introduce syscall_set_nr()

2025-02-19 Thread Maciej W. Rozycki
On Mon, 17 Feb 2025, Dmitry V. Levin wrote:

> diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
> index ea050b23d428..b956b015641c 100644
> --- a/arch/mips/include/asm/syscall.h
> +++ b/arch/mips/include/asm/syscall.h
> @@ -41,6 +41,20 @@ static inline long syscall_get_nr(struct task_struct *task,
>   return task_thread_info(task)->syscall;
>  }
>  
> +static inline void syscall_set_nr(struct task_struct *task,
> +   struct pt_regs *regs,
> +   int nr)
> +{
> + /*
> +  * New syscall number has to be assigned to regs[2] because
> +  * syscall_trace_entry() loads it from there unconditionally.

 That label is called `trace_a_syscall' in arch/mips/kernel/scall64-o32.S 
instead.  To bring some order and avoid an inaccuracy here should the odd 
one be matched to the other three?

> +  *
> +  * Consequently, if the syscall was indirect and nr != __NR_syscall,
> +  * then after this assignment the syscall will cease to be indirect.
> +  */
> + task_thread_info(task)->syscall = regs->regs[2] = nr;
> +}
> +
>  static inline void mips_syscall_update_nr(struct task_struct *task,
> struct pt_regs *regs)
>  {

 Otherwise:

Reviewed-by: Maciej W. Rozycki 

for this part, thank you!

  Maciej

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc