hstate_inode() is hugetlbfs-specific, limiting
hugetlb_add_to_page_cache() to hugetlbfs.
hugetlb_filemap_add_folio() allows hstate to be specified and further
separates hugetlb from hugetlbfs.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c| 13
parameter for these accounting functions since the
inode's block counts need to be updated during accounting.
The inode's resv_map will also still need to be updated if not NULL.
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 59 -
inc
-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 2 +-
include/linux/hugetlb.h | 6 --
mm/hugetlb.c| 37 +
3 files changed, 26 insertions(+), 19 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0fc49b6252e4
Add tests for 2MB and 1GB page sizes.
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/guest_memfd_test.c | 33 ++-
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c
b/tools/testing/selftests/kvm
This will allow preparation steps to be shared
Signed-off-by: Ackerley Tng
---
include/linux/mm.h | 1 +
mm/truncate.c | 24 ++--
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1f79667824eb..7a8f6b810de0
decoupling hugetlb from hugetlbfs.
Signed-off-by: Ackerley Tng
---
mm/hugetlb.c | 184 +++
1 file changed, 99 insertions(+), 85 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d16c6417b90f..d943f83d15a9 100644
--- a/mm/hugetlb.c
+++ b/mm
Cleanup in kvm_gmem_release() should be the reverse of
kvm_gmem_create_file().
Cleanup in kvm_gmem_evict_inode() should be the reverse of
kvm_gmem_create_inode().
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_mem.c | 105 +--
1 file changed, 71
Parametrize alloc_hugetlb_folio_from_subpool() by resv_map to remove
the use of vma_resv_map() and decouple hugetlb with hugetlbfs.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c| 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a
tialized on boot or on first gmem file creation?
+ Or is one subpool per gmem file fine?
2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
reservations?
[1] https://lore.kernel.org/lkml/zem5zq8oo+xna...@google.com/
---
Ackerley Tng (19):
mm: hugetlb:
Parametrize remove_mapping_hugepages() and hugetlb_unreserve_pages()
by resv_map to remove the use of inode_resv_map() and decouple hugetlb
with hugetlbfs.
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 16 ++--
include/linux/hugetlb.h | 6 --
mm/hugetlb.c
subpool_inode() and hstate_inode() are hugetlbfs-specific.
By allowing subpool and hstate to be specified, hugetlb is further
modularized from hugetlbfs.
Signed-off-by: Ackerley Tng
---
include/linux/hugetlb.h | 3 +++
mm/hugetlb.c| 16
2 files changed, 15
Expose inode_resv_map() so that hugetlbfs can access its own resv_map.
Hide restore_reserve_on_error_vma(), that function is now only used
within mm/hugetlb.c.
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 2 +-
include/linux/hugetlb.h | 21 +++--
mm/hugetlb.c
First create a gmem inode, then create a gmem file using the inode,
then install the file into an fd.
Creating the file in layers separates inode concepts (struct kvm_gmem)
from file concepts and makes cleaning up in stages neater.
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_mem.c | 86
Expose get_hstate_idx() so it can be used from KVM's guest_mem code
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 9 -
include/linux/hugetlb.h | 14 ++
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/in
First stage of hugetlb support: add initialization and cleanup
routines
Signed-off-by: Ackerley Tng
---
include/uapi/linux/kvm.h | 25
virt/kvm/guest_mem.c | 88 +---
2 files changed, 108 insertions(+), 5 deletions(-)
diff --git a/include
Adds support for various type of backing sources for private
memory (in the sense of confidential computing), similar to the
backing sources available for shared memory.
Signed-off-by: Ackerley Tng
---
.../testing/selftests/kvm/include/test_util.h | 14
tools/testing/selftests/kvm/lib
Introduce kvm_gmem_hugetlb_get_folio(), then update
kvm_gmem_allocate() and kvm_gmem_truncate() to use hugetlb functions.
Signed-off-by: Ackerley Tng
---
virt/kvm/guest_mem.c | 215 +--
1 file changed, 188 insertions(+), 27 deletions(-)
diff --git a/virt
Update private_mem_conversions_test for various private memory backing
source types
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 38 ++-
1 file changed, 28 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64
Zeroing of pages is generalizable to hugetlb and is not specific to
hugetlbfs.
Rename hugetlbfs_zero_partial_page => hugetlb_zero_partial_page, move
it to mm/hugetlb.c and expose it in linux/hugetlb.h.
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c|
TODO may want to move this to hugetlb
Signed-off-by: Ackerley Tng
---
fs/hugetlbfs/inode.c| 3 +--
include/linux/hugetlb.h | 4
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3dab50d3ed88..4f25df31ae80 100644
--- a/fs
t 03:40:26PM +0800,
> > > Chao Peng wrote:
> > >
> > > > On Wed, Mar 08, 2023 at 12:13:24AM +, Ackerley Tng wrote:
> > > > > Chao Peng writes:
> > > > >
> > > > > > On Sat, Jan 14, 2023 at 12:01:01AM +, Sean
Chri
Sean Christopherson writes:
On Thu, Apr 13, 2023, Christian Brauner wrote:
On Thu, Aug 18, 2022 at 04:24:21PM +0300, Kirill A . Shutemov wrote:
> On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
> > Here's what I would prefer, and imagine much easier for you to
maintain;
> > bu
Refactor out mpol_init_from_nodemask() to simplify logic in do_mbind().
mpol_init_from_nodemask() will be used to perform similar
functionality in do_memfd_restricted_bind() in a later patch.
Signed-off-by: Ackerley Tng
---
mm/mempolicy.c | 32 +---
1 file changed
mpol_create builds a mempolicy based on mode, nmask and maxnode.
mpol_create is exposed for use in memfd_restricted_bind() in a later
patch.
Signed-off-by: Ackerley Tng
---
include/linux/mempolicy.h | 2 ++
mm/mempolicy.c| 39 +++
2 files
Refactor out __mpol_set_shared_policy() to remove dependency on struct
vm_area_struct, since only 2 parameters from struct vm_area_struct are
used.
__mpol_set_shared_policy() will be used in a later patch by
restrictedmem_set_shared_policy().
Signed-off-by: Ackerley Tng
---
include/linux
allocated
on (e.g. /proc/pid/numa_maps) cannot be used.
This selftest adds a small kernel module that overloads the ioctl
syscall on /proc/restrictedmem to request a restrictedmem page and get
the node it was allocated on. The page is freed within the ioctl handler.
Signed-off-by: Ackerley Tng
---
tools
lkml/cover.1681176340.git.ackerley...@google.com/T/
[3] https://github.com/chao-p/linux/commits/privmem-v11.5
---
Ackerley Tng (6):
mm: shmem: Refactor out shmem_shared_policy() function
mm: mempolicy: Refactor out mpol_init_from_nodemask
mm: mempolicy: Refactor out __mpol_set_shared_pol
are supported.
This syscall is specialised just for restrictedmem files because this
functionality is not required by other files.
Signed-off-by: Ackerley Tng
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/mempolicy.h
Refactor out shmem_shared_policy() to allow reading of a file's shared
mempolicy
Signed-off-by: Ackerley Tng
---
include/linux/shmem_fs.h | 7 +++
mm/shmem.c | 10 ++
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/include/linux/shmem_fs.h b/in
Christian Brauner writes:
On Wed, Apr 05, 2023 at 09:58:44PM +, Ackerley Tng wrote:
...
> > Why do you even need this flag? It seems that @mount_fd being < 0 is
> > sufficient to indicate that a new restricted memory fd is supposed
to be
> > created in the sys
Chao Peng writes:
From: "Kirill A. Shutemov"
Introduce 'memfd_restricted' system call with the ability to create
memory areas that are restricted from userspace access through ordinary
MMU operations (e.g. read/write/mmap). The memory content is expected to
be used through the new in-kernel
David Hildenbrand writes:
On 01.04.23 01:50, Ackerley Tng wrote:
For memfd_restricted() calls without a userspace mount, the backing
file should be the shmem mount in the kernel, and the size of backing
pages should be as defined by system-wide shmem configuration.
If a userspace mount is
-p/linux/commits/privmem-v11.5
Links to earlier patch series:
+ RFC v3:
https://lore.kernel.org/lkml/cover.1680306489.git.ackerley...@google.com/T/
+ RFC v2:
https://lore.kernel.org/lkml/cover.1679428901.git.ackerley...@google.com/T/
+ RFC v1:
https://lore.kernel.org/lkml/cover.1676507663.git.acker
.
Also includes negative tests for invalid inputs, including fds
representing read-only superblocks/mounts.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/memfd_restricted_usermnt.c
or is intended to parallel that of
the openat() syscall.
memfd_restricted() will check that the tmpfs superblock is
writable, and that the mount is also writable, before attempting to
create a restrictedmem file on the mount.
Signed-off-by: Ackerley Tng
---
include/linux/syscalls.h
Thanks for reviewing these patches!
"Kirill A. Shutemov" writes:
On Fri, Mar 31, 2023 at 11:50:39PM +, Ackerley Tng wrote:
...
+static int restrictedmem_create_on_user_mount(int mount_fd)
+{
+ int ret;
+ struct fd f;
+ struct vfsmount *mnt;
+
Thanks for your review!
David Hildenbrand writes:
On 01.04.23 01:50, Ackerley Tng wrote:
...
diff --git a/include/uapi/linux/restrictedmem.h
b/include/uapi/linux/restrictedmem.h
new file mode 100644
index ..22d6f2285f6d
--- /dev/null
+++ b/include/uapi/linux
Thanks again for your review!
Christian Brauner writes:
On Tue, Apr 04, 2023 at 03:53:13PM +0200, Christian Brauner wrote:
On Fri, Mar 31, 2023 at 11:50:39PM +, Ackerley Tng wrote:
>
> ...
>
> -SYSCALL_DEFINE1(memfd_restricted, unsigned int, flags)
> +static int restr
Christian Brauner writes:
On Tue, Mar 21, 2023 at 08:15:32PM +, Ackerley Tng wrote:
By default, the backing shmem file for a restrictedmem fd is created
on shmem's kernel space mount.
...
Thanks for reviewing this patch!
This looks like you can just pass in some tmpfs fd an
.
Also includes negative tests for invalid inputs, including fds
representing read-only superblocks/mounts.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/Makefile | 1 +
.../selftests/restrictedmem/.gitignore| 3 +
.../testing/selftests/restrictedmem/Makefile
...@google.com/T/
+ RFC v1:
https://lore.kernel.org/lkml/cover.1676507663.git.ackerley...@google.com/T/
---
Ackerley Tng (2):
mm: restrictedmem: Allow userspace to specify mount for
memfd_restricted
selftests: restrictedmem: Check hugepage-ness of shmem file backing
restrictedmem fd
include
or is intended to parallel that of
the openat() syscall.
memfd_restricted() will check that the tmpfs superblock is
writable, and that the mount is also writable, before attempting to
create a restrictedmem file on the mount.
Signed-off-by: Ackerley Tng
---
include/linux/syscalls.h
//lore.kernel.org/lkml/diqzzga0fv96@ackerleytng-cloudtop-sg.c.googlers.com/
Links to earlier patch series:
+ RFC v1:
https://lore.kernel.org/lkml/cover.1676507663.git.ackerley...@google.com/T/
Ackerley Tng (2):
mm: restrictedmem: Allow userspace to specify mount for
memfd_restricted
selft
.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/Makefile | 1 +
.../selftests/restrictedmem/.gitignore| 3 +
.../testing/selftests/restrictedmem/Makefile | 15 +
.../testing/selftests/restrictedmem/common.c | 9 +
.../testing/selftests/restrictedmem/common.h
deled after how sys_open() can create an unnamed
temporary file in a given directory with O_TMPFILE.
This will help restrictedmem fds inherit the properties of the
provided tmpfs mounts, for example, hugepage allocation hints, NUMA
binding hints, etc.
Signed-off-by: Ackerley Tng
---
include/
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/vm/memfd_restricted.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/vm/memfd_restricted.c
b/tools/testing/selftests/vm/memfd_restricted.c
index 43a512f273f7..9c4e6a0becbc 100644
--- a/tools/testing
-off-by: Ackerley Tng
---
.../selftests/kvm/set_memory_region_test.c| 29 +--
1 file changed, 26 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c
b/tools/testing/selftests/kvm/set_memory_region_test.c
index cc727d11569e
Default the private/shared memory conversion tests to use a single
file (when multiple memslots are requested), while executing on
multiple vCPUs in parallel, to stress-test the restrictedmem subsystem.
Also add a flag to allow multiple files to be used.
Signed-off-by: Ackerley Tng
---
.../kvm
st, upon a private access to non-private memslot, KVM
should also exit to userspace with KVM_EXIT_MEMORY_FAULT.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/x86_64/private_mem_kvm_exits_test.c | 124 ++
2 files changed, 125
Provide new function to allow restrictedmem's fd and offset to be
specified in selftests.
No functional change intended to vm_userspace_mem_region_add.
Signed-off-by: Ackerley Tng
---
.../selftests/kvm/include/kvm_util_base.h | 4 ++
tools/testing/selftests/kvm/lib/kvm_util.c
By running the private/shared memory conversion tests on multiple
vCPUs in parallel, we stress-test the restrictedmem subsystem to
test conversion of non-overlapping GPA ranges in multiple memslots.
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 203
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/vm/memfd_restricted.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/memfd_restricted.c
b/tools/testing/selftests/vm/memfd_restricted.c
index 3a556b570129..43a512f273f7 100644
--- a/tools
Default the private/shared memory conversion tests to use a single
memslot, while executing on multiple vCPUs in parallel, to stress-test
the restrictedmem subsystem.
Also add a flag to allow multiple memslots to be used.
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64
.
In this test, we exercise fallocate to back and unback memory using
the restrictedmem fd, and we expect no problems (crashes) after the
KVM functions have been unbound.
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 26 ++-
1 file changed, 25
accessible to host userspace via the HVA.
Signed-off-by: Ackerley Tng
---
.../kvm/x86_64/private_mem_conversions_test.c | 54 ---
1 file changed, 48 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c
b/tools/testing
/ddd2c92b268a2fdc6158f82a6169ad1a57f2a01d
+ Proposed fix to adjust VM's initial stack address to align with SysV
ABI spec:
https://lore.kernel.org/lkml/20230227180601.104318-1-ackerley...@google.com/
Ackerley Tng (10):
KVM: selftests: Test error message fixes for memfd_restricted
selftests
Chao Peng writes:
On Sat, Jan 14, 2023 at 12:01:01AM +, Sean Christopherson wrote:
On Fri, Dec 02, 2022, Chao Peng wrote:
...
Strongly prefer to use similar logic to existing code that detects wraps:
mem->restricted_offset + mem->memory_size <
mem->restricted_offset
Chao Peng writes:
Register/unregister private memslot to fd-based memory backing store
restrictedmem and implement the callbacks for restrictedmem_notifier:
- invalidate_start()/invalidate_end() to zap the existing memory
mappings in the KVM page table.
- error() to request KVM_REQ_M
Yuan Yao writes:
On Sat, Feb 18, 2023 at 12:43:00AM +, Ackerley Tng wrote:
Hello,
This patchset builds upon the memfd_restricted() system call that has
been discussed in the ‘KVM: mm: fd-based approach for supporting KVM’
patch series, at
https://lore.kernel.org/lkml
"Kirill A. Shutemov" writes:
On Thu, Feb 16, 2023 at 12:41:16AM +, Ackerley Tng wrote:
By default, the backing shmem file for a restrictedmem fd is created
on shmem's kernel space mount.
With this patch, an optional tmpfs mount can be specified, which will
be used as
Tests that when RMFD_HUGEPAGE is specified, restrictedmem will be
backed by Transparent HugePages.
Signed-off-by: Ackerley Tng
---
.../restrictedmem_hugepage_test.c | 25 +++
1 file changed, 25 insertions(+)
diff --git
a/tools/testing/selftests/restrictedmem
Allow userspace to hint the kernel to use Transparent HugePages to
back restricted memory on a per-file basis.
Signed-off-by: Ackerley Tng
---
include/uapi/linux/restrictedmem.h | 1 +
mm/restrictedmem.c | 27 +--
2 files changed, 18 insertions(+), 10
() syscall
+ Support for per file NUMA binding hints
Ackerley Tng (2):
mm: restrictedmem: Add flag as THP allocation hint for
memfd_restricted() syscall
selftests: restrictedmem: Add selftest for RMFD_HUGEPAGE
include/uapi/linux/restrictedmem.h| 1 +
mm/restrictedmem.c
.
Signed-off-by: Ackerley Tng
---
tools/testing/selftests/Makefile | 1 +
.../selftests/restrictedmem/.gitignore| 3 +
.../testing/selftests/restrictedmem/Makefile | 14 +
.../testing/selftests/restrictedmem/common.c | 9 +
.../testing/selftests/restrictedmem/common.h
/
Future work/TODOs:
+ man page for the memfd_restricted() syscall
+ Support for per file Transparent HugePage allocation hints
+ Support for per file NUMA binding hints
Ackerley Tng (2):
mm: restrictedmem: Allow userspace to specify mount_path for
memfd_restricted
selftests: restrictedmem
r how sys_open() can create an unnamed
temporary file in a given directory with O_TMPFILE.
This will help restrictedmem fds inherit the properties of the
provided tmpfs mounts, for example, hugepage allocation hints, NUMA
binding hints, etc.
Signed-off-by: Ackerley Tng
---
include/linux/sysca
+static int restrictedmem_getattr(struct user_namespace *mnt_userns,
+const struct path *path, struct kstat *stat,
+u32 request_mask, unsigned int query_flags)
+{
+ struct inode *inode = d_inode(path->dentry);
+ struct
> A memslot with KVM_MEM_PRIVATE being set can include both fd-based
> private memory and hva-based shared memory. Architecture code (like TDX
> code) can tell whether the on-going fault is private or not. This patch
> adds a 'is_private' field to kvm_page_fault to indicate this and
> architecture
68 matches
Mail list logo