Re: [PATCH v6 1/2] KVM: guest_memfd: add generic population via write

Nikita Kalyazin Fri, 24 Oct 2025 07:36:26 -0700



On 23/10/2025 17:07, Sean Christopherson wrote:

On Mon, Oct 20, 2025, Nikita Kalyazin wrote:

From: Nikita Kalyazin <[email protected]>


+ Vishal and Ackerley


write syscall populates guest_memfd with user-supplied data in a generic
way, ie no vendor-specific preparation is performed.  If the request is
not page-aligned, the remaining bytes are initialised to 0.

write is only supported for non-CoCo setups where guest memory is not
hardware-encrypted.


Please include all of the "why".  The code mostly communicates the "what", but
it doesn't capture why write() support is at all interesting, nor does it 
explain
why read() isn't supported.


Hi Sean,

Thanks for the review.

Do you think including the explanation from the cover letter would besufficient? Shall I additionally say that read() isn't supportedbecause there is no use case for it as of now or would it be obvious?

Signed-off-by: Nikita Kalyazin <[email protected]>
---
  virt/kvm/guest_memfd.c | 48 ++++++++++++++++++++++++++++++++++++++++++


There's a notable lack of uAPI and Documentation chanegs.  I.e. this needs a
GUEST_MEMFD_FLAG_xxx along with proper documentation.


Would the following be ok in the doc?

When the capability KVM_CAP_GUEST_MEMFD_WRITE is supported, the 'flags'field

supports GUEST_MEMFD_FLAG_WRITE. Setting this flag on guest_memfd creation
enables write() syscall operations to populate guest_memfd memory from host
userspace.

When a write() operation is performed on a guest_memfd file descriptorwith the

GUEST_MEMFD_FLAG_WRITE set, the syscall will populate the guest memory with

user-supplied data in a generic way, without any vendor-specificpreparation.

The write operation is only supported for non-CoCo (Confidential Computing)
setups where guest memory is not hardware-encrypted. If the write request is

not page-aligned, any remaining bytes within the page are initialized tozero.


And while it's definitely it's a-ok to land .write() in advance of the direct 
map
changes, we do need to at least map out how we want the two to interact, e.g. so
that we don't end up with constraints that are impossible to satisfy.

write() shall not attempt to access a page that is not in the directmap, which I believe can be achieved via kvm_kmem_gmem_write_begin()consulting the KVM_GMEM_FOLIO_NO_DIRECT_MAP in folio->private(introduced in [1]).

Do you think we should mention it in the commit message in some way?What particular constraint are you cautious about?


[1] https://lore.kernel.org/kvm/[email protected]/

  1 file changed, 48 insertions(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 94bafd6c558c..f4e218049afa 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -380,6 +380,8 @@ static int kvm_gmem_mmap(struct file *file, struct 
vm_area_struct *vma)

  static struct file_operations kvm_gmem_fops = {
       .mmap           = kvm_gmem_mmap,
+     .llseek         = default_llseek,
+     .write_iter     = generic_perform_write,
       .open           = generic_file_open,
       .release        = kvm_gmem_release,
       .fallocate      = kvm_gmem_fallocate,
@@ -390,6 +392,49 @@ void kvm_gmem_init(struct module *module)
       kvm_gmem_fops.owner = module;
  }

+static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
+                                  struct address_space *mapping,
+                                  loff_t pos, unsigned int len,
+                                  struct folio **foliop,
+                                  void **fsdata)


Over-aggressive wrapping, this can be


static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
                                      struct address_space *mapping, loff_t pos,
                                      unsigned int len, struct folio **folio,
                                      void **fsdata)

or

static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
                                      struct address_space *mapping,
                                      loff_t pos, unsigned int len,
                                      struct folio **folio, void **fsdata)

if we want to bundle pos+len.


Ack.

+{
+     struct file *file = kiocb->ki_filp;


ki_filp is already a file, and even if it were a "void *", there's no need for a
local variable.


Ack.

+     struct inode *inode = file_inode(file);
+     pgoff_t index = pos >> PAGE_SHIFT;
+     struct folio *folio;
+
+     if (!kvm_gmem_supports_mmap(inode))


Checking for MMAP is neither sufficient nor strictly necessary.  MMAP doesn't
imply SHARED, and it's not clear to me that mmap() support should be in any way
tied to WRITE support.

As in my reply to the comment about doc, I plan to introduceKVM_CAP_GUEST_MEMFD_WRITE and GUEST_MEMFD_FLAG_WRITE. Thekvm_arch_supports_gmem_write() will be a weak symbol and relying on!kvm_arch_has_private_mem() on x86, similar tokvm_arch_supports_gmem_mmap(). Does it look right?

+             return -ENODEV;
+
+     if (pos + len > i_size_read(inode))
+             return -EINVAL;
+
+     folio = kvm_gmem_get_folio(inode, index);


Eh, since "index" is only used once, my vote is to use "pos" and do the shift
here, so that it's obvious that the input to kvm_gmem_get_folio() is being 
checked.


Ack.

+     if (IS_ERR(folio))
+             return -EFAULT;


Why EFAULT?


Will propagate the error like you suggest below.

+
+     *foliop = folio;


There shouldn't be any need for a local "folio".  What about having the "out"
param be just "folio"?

E.g.

static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb,
                                      struct address_space *mapping,
                                      loff_t pos, unsigned int len,
                                      struct folio **folio, void **fsdata)
{
         struct inode *inode = file_inode(kiocb->ki_filp);

         if (!kvm_gmem_supports_write(inode))
                 return -ENODEV;

         if (pos + len > i_size_read(inode))
                 return -EINVAL;

         *folio = kvm_gmem_get_folio(inode, pos >> PAGE_SHIFT);
         if (IS_ERR(*folio))
                 return PTR_ERR(*folio);

         return 0;
}


Ack.

+     return 0;
+}
+
+static int kvm_kmem_gmem_write_end(const struct kiocb *kiocb,
+                                struct address_space *mapping,
+                                loff_t pos, unsigned int len,
+                                unsigned int copied,
+                                struct folio *folio, void *fsdata)
+{
+     if (copied && copied < len) {


Why check if "copied" is non-zero?  I don't see why KVM should behave 
differently
with respect to unwritten bytes if copy_folio_from_iter_atomic() fails on the
first byte or the Nth byte.

No, I don't think there is a need for this check indeed. It looks likea leftover from my previous changes.

+             unsigned int from = pos & ((1UL << folio_order(folio)) - 1);


Uh, isn't this just offset_in_folio()?

+
+             folio_zero_range(folio, from + copied, len - copied);


I'd probably be in favor of omitting "from" entirely, e.g.

         if (copied < len)
                 folio_zero_range(folio, offset_in_folio(pos) + copied,
                                  len - copied);


Ack.

+     }
+
+     folio_unlock(folio);
+     folio_put(folio);
+
+     return copied;
+}

Re: [PATCH v6 1/2] KVM: guest_memfd: add generic population via write

Reply via email to