On Wed, Jun 5, 2024, 12:02 David Hildenbrand <da...@redhat.com> wrote:
> On 05.06.24 17:19, Stefan Hajnoczi wrote: > > On Wed, 5 Jun 2024 at 10:29, Stefan Hajnoczi <stefa...@redhat.com> > wrote: > >> > >> On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote: > >>> On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi <stefa...@redhat.com> > wrote: > >>> > >>>> On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote: > >>>>> Add SHMEM_MAP/UNMAP requests to vhost-user. > >>>>> > >>>>> This request allows backends to dynamically map > >>>>> fds into a shared memory region indentified by > >>>> > >>>> Please call this "VIRTIO Shared Memory Region" everywhere (code, > >>>> vhost-user spec, commit description, etc) so it's clear that this is > not > >>>> about vhost-user shared memory tables/regions. > >>>> > >>>>> its `shmid`. Then, the fd memory is advertised > >>>>> to the frontend through a BAR+offset, so it can > >>>>> be read by the driver while its valid. > >>>> > >>>> Why is a PCI BAR mentioned here? vhost-user does not know about the > >>>> VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to > >>>> report VIRTIO Shared Memory Regions to the driver. > >>>> > >>>> > >>> I will remove PCI BAR, as it is true that it depends on the > >>> transport. I was trying to explain that the driver > >>> will use the shm_base + shm_offset to access > >>> the mapped memory. > >>> > >>> > >>>>> > >>>>> Then, the backend can munmap the memory range > >>>>> in a given shared memory region (again, identified > >>>>> by its `shmid`), to free it. After this, the > >>>>> region becomes private and shall not be accessed > >>>>> by the frontend anymore. > >>>> > >>>> What does "private" mean? > >>>> > >>>> The frontend must mmap PROT_NONE to reserve the virtual memory space > >>>> when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an > >>>> unrelated mmap(NULL, ...) might use that address range and the guest > >>>> would have access to the host memory! This is a security issue and > needs > >>>> to be mentioned explicitly in the spec. > >>>> > >>> > >>> I mentioned private because it changes the mapping from MAP_SHARED > >>> to MAP_PRIVATE. I will highlight PROT_NONE instead. > >> > >> I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you > >> mean mmap flags or something like the memory range is no longer > >> accessible to the driver. > > > > One more thing: please check whether kvm.ko memory regions need to be > > modified or split to match the SHMEM_MAP mapping's read/write > > permissions. > > > > The VIRTIO Shared Memory Area pages can have PROT_READ, PROT_WRITE, > > PROT_READ|PROT_WRITE, or PROT_NONE. > > > > kvm.ko memory regions are read/write or read-only. I'm not sure what > > happens when the guest accesses a kvm.ko memory region containing > > mappings with permissions more restrictive than its kvm.ko memory > > region. > > IIRC, the KVM R/O memory region requests could allow to further reduce > permissions (assuming your mmap is R/W you could map it R/O into the KVM > MMU), but I might remember things incorrectly. > I'm thinking about the opposite case where KVM is configured for R/W but the mmap is more restrictive. This patch series makes this scenario possible. > > > In other words, the kvm.ko memory region would allow the > > access but the Linux virtual memory configuration would cause a page > > fault. > > > > For example, imagine a QEMU MemoryRegion containing a SHMEM_MAP > > mapping with PROT_READ. The kvm.ko memory region would be read/write > > (unless extra steps were taken to tell kvm.ko about the permissions). > > When the guest stores to the PROT_READ page, kvm.ko will process a > > fault...and I'm not sure what happens next. > > > > A similar scenario occurs when a PROT_NONE mapping exists within a > > kvm.ko memory region. I don't remember how kvm.ko behaves when the > > guest tries to access the pages. > > > > It's worth figuring this out before going further because it could > > become tricky if issues are discovered later. I have CCed David > > Hildenbrand in case he knows. > > > One relevant piece is likely: > > "When the KVM_CAP_SYNC_MMU capability is available, changes in the > backing of the memory region are automatically reflected into the guest. > For example, an mmap() that affects the region will be made visible > immediately. " > > We can already effectively get R/O or PROT_NONE PTEs in > PROT_READ|PROT_WRITE mappings, and the KVM must be able to handle that > properly -- trigegring a page fault to let core-MM resolve that. > > If we have a PROT_NONE VMA and the guest writes to it, we'd likely end > up (to resolve the KVM MMU page fault) in > hva_to_pfn_slow()->get_user_pages_unlocked(), which would return -EFAULT. > > Not sure if we really inject a page fault into the guest or if the KVM > run would effectively fail with -EFAULT and make user space unhappy. > Certainly something to play with! > I think KVM probably treats the fault as a misconfiguration that host user space needs to resolve. There might be no way to inject a page fault into the guest because the access is allowed according to the guest page tables and the guest fault handlers aren't prepared to deal with a spurious fault. Let's see what happens in practice. I think the cleanest solution would be to create separate kvm.ko memory regions with the appropriate permissions (i.e. might require creating multiple MemoryRegions inside QEMU for a single VIRTIO Shared Memory Region when permissions vary between mmaps). Stefan > -- > Cheers, > > David / dhildenb > >