On Fri, 17 Oct 2025 11:57:08 -0400 Faith Ekstrand <[email protected]> wrote:
> On Fri, Oct 17, 2025 at 11:50 AM Boris Brezillon > <[email protected]> wrote: > > > > +Matt for my comment on PVR having the same issue. > > > > On Fri, 17 Oct 2025 11:35:54 -0400 > > Faith Ekstrand <[email protected]> wrote: > > > > > On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon > > > <[email protected]> wrote: > > > > > > > > On Fri, 17 Oct 2025 10:40:46 -0400 > > > > Faith Ekstrand <[email protected]> wrote: > > > > > > > > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <[email protected]> > > > > > wrote: > > > > > > > > > > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > Prepare things for standardizing synchronization around CPU > > > > > > > accesses > > > > > > > of GEM buffers. This will be used to provide default > > > > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and > > > > > > > provide > > > > > > > a way for drivers to add their own ioctls to synchronize CPU > > > > > > > writes/reads when they can't do it directly from userland. > > > > > > > > > > > > > > v2: > > > > > > > - New commit > > > > > > > > > > > > > > v3: > > > > > > > - No changes > > > > > > > > > > > > > > v4: > > > > > > > - Add Steve's R-b > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <[email protected]> > > > > > > > Reviewed-by: Steven Price <[email protected]> > > > > > > > --- > > > > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++ > > > > > > > include/drm/drm_gem.h | 45 > > > > > > > +++++++++++++++++++++++++++++++++++++++ > > > > > > > 2 files changed, 55 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > > > > > > > index a1a9c828938b..a1431e4f2404 100644 > > > > > > > --- a/drivers/gpu/drm/drm_gem.c > > > > > > > +++ b/drivers/gpu/drm/drm_gem.c > > > > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object > > > > > > > *obj, struct iosys_map *map) > > > > > > > } > > > > > > > EXPORT_SYMBOL(drm_gem_vunmap); > > > > > > > > > > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, > > > > > > > size_t size, > > > > > > > + enum drm_gem_object_access_flags access) > > > > > > > +{ > > > > > > > + if (obj->funcs->sync) > > > > > > > + return obj->funcs->sync(obj, offset, size, > > > > > > > access); > > > > > > > + > > > > > > > + return 0; > > > > > > > +} > > > > > > > +EXPORT_SYMBOL(drm_gem_sync); > > > > > > > + > > > > > > > /** > > > > > > > * drm_gem_lock_reservations - Sets up the ww context and > > > > > > > acquires > > > > > > > * the lock on an array of GEM objects. > > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > > > > > > > index 8d48d2af2649..1c33e59ab305 100644 > > > > > > > --- a/include/drm/drm_gem.h > > > > > > > +++ b/include/drm/drm_gem.h > > > > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status { > > > > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2), > > > > > > > }; > > > > > > > > > > > > > > +/** > > > > > > > + * enum drm_gem_object_status - bitmask describing GEM access > > > > > > > types to prepare for > > > > > > > + */ > > > > > > > +enum drm_gem_object_access_flags { > > > > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. > > > > > > > */ > > > > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0, > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device > > > > > > > access. */ > > > > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the > > > > > > > entity doing the access. */ > > > > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only > > > > > > > accesses. */ > > > > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only > > > > > > > accesses. */ > > > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write > > > > > > > accesses. */ > > > > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS | > > > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS, > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check > > > > > > > the access type. */ > > > > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = > > > > > > > DRM_GEM_OBJECT_RW_ACCESS, > > > > > > > +}; > > > > > > > + > > > > > > > /** > > > > > > > * struct drm_gem_object_funcs - GEM object functions > > > > > > > */ > > > > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs { > > > > > > > */ > > > > > > > int (*mmap)(struct drm_gem_object *obj, struct > > > > > > > vm_area_struct *vma); > > > > > > > > > > > > > > + /** > > > > > > > + * @sync: > > > > > > > + * > > > > > > > + * Prepare for CPU/device access. This can involve > > > > > > > migration of > > > > > > > + * a buffer to the system-RAM/VRAM, or for UMA, > > > > > > > flushing/invalidating > > > > > > > + * the CPU caches. The range can be used to optimize the > > > > > > > synchronization > > > > > > > + * when possible. > > > > > > > > > > > > This has gone in a very different direction from the version I sent > > > > > > out and the added generality makes me really nervous. The idea of > > > > > > sync > > > > > > involving migration and that the range is a mere hint are > > > > > > antithetical > > > > > > with Vulkan. It's a very GLish design that assumes that a BO is > > > > > > exclusively used by one of the CPU or the GPU at the same time. This > > > > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as > > > > > > dma-buf itself) are littered with such ioctls and we're in the > > > > > > process > > > > > > of deleting them all. > > > > > > > > > > And yes, I realize I sent this on the patch for the hook which you > > > > > intended to plumb through to dma-buf. However, I also saw it being > > > > > propagated to an ioctl and I didn't know where else to put it that had > > > > > the relevant details. > > > > > > > > > > ~Faith > > > > > > > > > > > If the BO needs to be migrated in order to be accessed from the CPU, > > > > > > that needs to happen on map, not on some sort of begin/end. Or > > > > > > better > > > > > > yet, just disallow mapping such buffers. Once the client has a map, > > > > > > they are free to access from the CPU while stuff is running on the > > > > > > GPU. They have to be careful, of course, not to cause data races, > > > > > > but > > > > > > accessing the same BO from the CPU and GPU or even the same range is > > > > > > totally okay if you aren't racing. > > > > > > > > > > > > As a corollary, just don't map PRIME buffers. > > > > > > > > > > > > And the range really shouldn't be just a hint. With Vulkan, clients > > > > > > are regularly sub-allocating from larger memory objects. If they ask > > > > > > to flush 64B and end up flushing 64M, that's pretty bad. > > > > > > > > > > > > All we need is something which lets us trap through to the kernel > > > > > > for > > > > > > CPU cache management. That's all we need and that's really all it > > > > > > should do. > > > > > > > > Okay, so there's actually a problem with that I think, because we can't > > > > know how the buffer we export will be used. It can be imported by the > > > > same driver, and we're all good, but it can also be imported by a > > > > different driver, which decides to vmap or allow mmap() on it, and then > > > > we have to implement the dma_buf CPU sync hooks. Unless we decide that > > > > all exported buffers should be write-combine only? This is the very > > > > reason I started hooking things up on the dma_buf side, because we're > > > > not in control of who the importer of our buffers is. > > > > > > Exported buffers should be WC-only. We could try to get creative but > > > the moment we let the lack of coherency leak to other processes, > > > potentially to other drivers, we're in a world of hurt. Even with the > > > dma-buf begin/end hooks, if it's imported into a driver that does > > > Vulkan, those hooks don't make sense and we're screwed. > > > > Well, yeah, the 'entire-buf' granularity is a problem, indeed, > > If all that's being done is cache flushing, it's kind of okay. It's a > big hammer but it's okay. However, if the driver is doing literally > anything else in begin/end, it's all a lie the moment you allow that > buffer to be mapped in Vulkan. The client may be reading from the CPU > for GPU download in one subrange, writing from the CPU for GPU upload > in another subrange, and reading/writing from the GPU in another > subrange all at the same time. That's totally incompatible with the > dma-buf begin/end model. > > > and > > there's no way around it with the current dma-buf API, which is why I > > prevented external bufs from being mapped (AKA not host-visible in > > Vulkan's term). But I really thought imported buffers coming from > > panthor should be mapped cached. > > Yeah. So we could potentially allow WB maps if the client requests an > export via OPAQUE_FD since we know a priori that such a buffer will > only ever be re-imported into panfrost/panvk. However, I really don't > think that's a common enough case to be worth optimizing just yet. > > > > And, yes, I > > > know panvk is the only Vulkan implementation you're going to see on a > > > system with an Arm GPU, but thinking about things in the general case > > > across all of DRM, exporting non-coherent memory in 2025 is just > > > cursed. > > > > Hm, okay. If that's the way to go, then we should enforce > > > > !WB_MMAP || !PRIVATE > > > > in panthor, and fail the export of a WB_MMAP buffer in panfrost (we > > don't have a way to know that a buffer is private there). > > Yeah, I'm a fan of that for now. Might also require more changes when the GPU is IO-coherent, because IO-coherency is a per-device thing on Arm. Right now we forcibly map things WB even if the user didn't ask for it when the GPU is coherent, but the importer might not be IO-coherent. I think it's fine in Panthor because we can properly define the shareability/cachebility attributes, but on older Mali gens, I remember hitting some issues. For instance, we had issues when mapping things uncached on some IO-coherent amlogic integration of the G52. I've managed to disable coherency on my VIM3 with a few hacks (switching to the new page table format was one of them, but I'm not sure what I did is portable to older gens that only support the old page table format), but I'd have to make sure this can be done on a per mapping basis. Steve probably knows more about that. TLDR; maybe we should focus on Panthor first, so we don't block this MR on some changes breaking old HW in Panfrost.
