On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon <[email protected]> wrote: > > On Fri, 17 Oct 2025 10:40:46 -0400 > Faith Ekstrand <[email protected]> wrote: > > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <[email protected]> wrote: > > > > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon > > > <[email protected]> wrote: > > > > > > > > Prepare things for standardizing synchronization around CPU accesses > > > > of GEM buffers. This will be used to provide default > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide > > > > a way for drivers to add their own ioctls to synchronize CPU > > > > writes/reads when they can't do it directly from userland. > > > > > > > > v2: > > > > - New commit > > > > > > > > v3: > > > > - No changes > > > > > > > > v4: > > > > - Add Steve's R-b > > > > > > > > Signed-off-by: Boris Brezillon <[email protected]> > > > > Reviewed-by: Steven Price <[email protected]> > > > > --- > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++ > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++ > > > > 2 files changed, 55 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > > > > index a1a9c828938b..a1431e4f2404 100644 > > > > --- a/drivers/gpu/drm/drm_gem.c > > > > +++ b/drivers/gpu/drm/drm_gem.c > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, > > > > struct iosys_map *map) > > > > } > > > > EXPORT_SYMBOL(drm_gem_vunmap); > > > > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t > > > > size, > > > > + enum drm_gem_object_access_flags access) > > > > +{ > > > > + if (obj->funcs->sync) > > > > + return obj->funcs->sync(obj, offset, size, access); > > > > + > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(drm_gem_sync); > > > > + > > > > /** > > > > * drm_gem_lock_reservations - Sets up the ww context and acquires > > > > * the lock on an array of GEM objects. > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > > > > index 8d48d2af2649..1c33e59ab305 100644 > > > > --- a/include/drm/drm_gem.h > > > > +++ b/include/drm/drm_gem.h > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status { > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2), > > > > }; > > > > > > > > +/** > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to > > > > prepare for > > > > + */ > > > > +enum drm_gem_object_access_flags { > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */ > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0, > > > > + > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */ > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0), > > > > + > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the > > > > entity doing the access. */ > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0), > > > > + > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only > > > > accesses. */ > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1), > > > > + > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only > > > > accesses. */ > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2), > > > > + > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write > > > > accesses. */ > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS | > > > > + DRM_GEM_OBJECT_WRITE_ACCESS, > > > > + > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the > > > > access type. */ > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS, > > > > +}; > > > > + > > > > /** > > > > * struct drm_gem_object_funcs - GEM object functions > > > > */ > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs { > > > > */ > > > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct > > > > *vma); > > > > > > > > + /** > > > > + * @sync: > > > > + * > > > > + * Prepare for CPU/device access. This can involve migration of > > > > + * a buffer to the system-RAM/VRAM, or for UMA, > > > > flushing/invalidating > > > > + * the CPU caches. The range can be used to optimize the > > > > synchronization > > > > + * when possible. > > > > > > This has gone in a very different direction from the version I sent > > > out and the added generality makes me really nervous. The idea of sync > > > involving migration and that the range is a mere hint are antithetical > > > with Vulkan. It's a very GLish design that assumes that a BO is > > > exclusively used by one of the CPU or the GPU at the same time. This > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as > > > dma-buf itself) are littered with such ioctls and we're in the process > > > of deleting them all. > > > > And yes, I realize I sent this on the patch for the hook which you > > intended to plumb through to dma-buf. However, I also saw it being > > propagated to an ioctl and I didn't know where else to put it that had > > the relevant details. > > > > ~Faith > > > > > If the BO needs to be migrated in order to be accessed from the CPU, > > > that needs to happen on map, not on some sort of begin/end. Or better > > > yet, just disallow mapping such buffers. Once the client has a map, > > > they are free to access from the CPU while stuff is running on the > > > GPU. They have to be careful, of course, not to cause data races, but > > > accessing the same BO from the CPU and GPU or even the same range is > > > totally okay if you aren't racing. > > > > > > As a corollary, just don't map PRIME buffers. > > > > > > And the range really shouldn't be just a hint. With Vulkan, clients > > > are regularly sub-allocating from larger memory objects. If they ask > > > to flush 64B and end up flushing 64M, that's pretty bad. > > > > > > All we need is something which lets us trap through to the kernel for > > > CPU cache management. That's all we need and that's really all it > > > should do. > > Okay, so there's actually a problem with that I think, because we can't > know how the buffer we export will be used. It can be imported by the > same driver, and we're all good, but it can also be imported by a > different driver, which decides to vmap or allow mmap() on it, and then > we have to implement the dma_buf CPU sync hooks. Unless we decide that > all exported buffers should be write-combine only? This is the very > reason I started hooking things up on the dma_buf side, because we're > not in control of who the importer of our buffers is.
Exported buffers should be WC-only. We could try to get creative but the moment we let the lack of coherency leak to other processes, potentially to other drivers, we're in a world of hurt. Even with the dma-buf begin/end hooks, if it's imported into a driver that does Vulkan, those hooks don't make sense and we're screwed. And, yes, I know panvk is the only Vulkan implementation you're going to see on a system with an Arm GPU, but thinking about things in the general case across all of DRM, exporting non-coherent memory in 2025 is just cursed. ~Faith
