On Wed, Jun 5, 2024, 12:02 David Hildenbrand <da...@redhat.com> wrote:

> On 05.06.24 17:19, Stefan Hajnoczi wrote:
> > On Wed, 5 Jun 2024 at 10:29, Stefan Hajnoczi <stefa...@redhat.com>
> wrote:
> >>
> >> On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote:
> >>> On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi <stefa...@redhat.com>
> wrote:
> >>>
> >>>> On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> >>>>> Add SHMEM_MAP/UNMAP requests to vhost-user.
> >>>>>
> >>>>> This request allows backends to dynamically map
> >>>>> fds into a shared memory region indentified by
> >>>>
> >>>> Please call this "VIRTIO Shared Memory Region" everywhere (code,
> >>>> vhost-user spec, commit description, etc) so it's clear that this is
> not
> >>>> about vhost-user shared memory tables/regions.
> >>>>
> >>>>> its `shmid`. Then, the fd memory is advertised
> >>>>> to the frontend through a BAR+offset, so it can
> >>>>> be read by the driver while its valid.
> >>>>
> >>>> Why is a PCI BAR mentioned here? vhost-user does not know about the
> >>>> VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
> >>>> report VIRTIO Shared Memory Regions to the driver.
> >>>>
> >>>>
> >>> I will remove PCI BAR, as it is true that it depends on the
> >>> transport. I was trying to explain that the driver
> >>> will use the shm_base + shm_offset to access
> >>> the mapped memory.
> >>>
> >>>
> >>>>>
> >>>>> Then, the backend can munmap the memory range
> >>>>> in a given shared memory region (again, identified
> >>>>> by its `shmid`), to free it. After this, the
> >>>>> region becomes private and shall not be accessed
> >>>>> by the frontend anymore.
> >>>>
> >>>> What does "private" mean?
> >>>>
> >>>> The frontend must mmap PROT_NONE to reserve the virtual memory space
> >>>> when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
> >>>> unrelated mmap(NULL, ...) might use that address range and the guest
> >>>> would have access to the host memory! This is a security issue and
> needs
> >>>> to be mentioned explicitly in the spec.
> >>>>
> >>>
> >>> I mentioned private because it changes the mapping from MAP_SHARED
> >>> to MAP_PRIVATE. I will highlight PROT_NONE instead.
> >>
> >> I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you
> >> mean mmap flags or something like the memory range is no longer
> >> accessible to the driver.
> >
> > One more thing: please check whether kvm.ko memory regions need to be
> > modified or split to match the SHMEM_MAP mapping's read/write
> > permissions.
> >
> > The VIRTIO Shared Memory Area pages can have PROT_READ, PROT_WRITE,
> > PROT_READ|PROT_WRITE, or PROT_NONE.
> >
> > kvm.ko memory regions are read/write or read-only. I'm not sure what
> > happens when the guest accesses a kvm.ko memory region containing
> > mappings with permissions more restrictive than its kvm.ko memory
> > region.
>
> IIRC, the KVM R/O memory region requests could allow to further reduce
> permissions (assuming your mmap is R/W you could map it R/O into the KVM
> MMU), but I might remember things incorrectly.
>

I'm thinking about the opposite case where KVM is configured for R/W but
the mmap is more restrictive. This patch series makes this scenario
possible.


>
> > In other words, the kvm.ko memory region would allow the
> > access but the Linux virtual memory configuration would cause a page
> > fault.
> >
> > For example, imagine a QEMU MemoryRegion containing a SHMEM_MAP
> > mapping with PROT_READ. The kvm.ko memory region would be read/write
> > (unless extra steps were taken to tell kvm.ko about the permissions).
> > When the guest stores to the PROT_READ page, kvm.ko will process a
> > fault...and I'm not sure what happens next.
> >
> > A similar scenario occurs when a PROT_NONE mapping exists within a
> > kvm.ko memory region. I don't remember how kvm.ko behaves when the
> > guest tries to access the pages.
> >
> > It's worth figuring this out before going further because it could
> > become tricky if issues are discovered later. I have CCed David
> > Hildenbrand in case he knows.
>
>
> One relevant piece is likely:
>
> "When the KVM_CAP_SYNC_MMU capability is available, changes in the
> backing of the memory region are automatically reflected into the guest.
>   For example, an mmap() that affects the region will be made visible
> immediately. "
>
> We can already effectively get R/O or PROT_NONE PTEs in
> PROT_READ|PROT_WRITE mappings, and the KVM must be able to handle that
> properly -- trigegring a page fault to let core-MM resolve that.
>
> If we have a PROT_NONE VMA and the guest writes to it, we'd likely end
> up (to resolve the KVM MMU page fault) in
> hva_to_pfn_slow()->get_user_pages_unlocked(), which would return -EFAULT.
>
> Not sure if we really inject a page fault into the guest or if the KVM
> run would effectively fail with -EFAULT and make user space unhappy.
> Certainly something to play with!
>

I think KVM probably treats the fault as a misconfiguration that host user
space needs to resolve. There might be no way to inject a page fault into
the guest because the access is allowed according to the guest page tables
and the guest fault handlers aren't prepared to deal with a spurious fault.

Let's see what happens in practice. I think the cleanest solution would be
to create separate kvm.ko memory regions with the appropriate permissions
(i.e. might require creating multiple MemoryRegions inside QEMU for a
single VIRTIO Shared Memory Region when permissions vary between mmaps).

Stefan


> --
> Cheers,
>
> David / dhildenb
>
>

Reply via email to