+Matt for my comment on PVR having the same issue.

On Fri, 17 Oct 2025 11:35:54 -0400
Faith Ekstrand <[email protected]> wrote:

> On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
> <[email protected]> wrote:
> >
> > On Fri, 17 Oct 2025 10:40:46 -0400
> > Faith Ekstrand <[email protected]> wrote:
> >  
> > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <[email protected]> 
> > > wrote:  
> > > >
> > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > > <[email protected]> wrote:  
> > > > >
> > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > of GEM buffers. This will be used to provide default
> > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > writes/reads when they can't do it directly from userland.
> > > > >
> > > > > v2:
> > > > > - New commit
> > > > >
> > > > > v3:
> > > > > - No changes
> > > > >
> > > > > v4:
> > > > > - Add Steve's R-b
> > > > >
> > > > > Signed-off-by: Boris Brezillon <[email protected]>
> > > > > Reviewed-by: Steven Price <[email protected]>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > >  include/drm/drm_gem.h     | 45 
> > > > > +++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 55 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object 
> > > > > *obj, struct iosys_map *map)
> > > > >  }
> > > > >  EXPORT_SYMBOL(drm_gem_vunmap);
> > > > >
> > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t 
> > > > > size,
> > > > > +                enum drm_gem_object_access_flags access)
> > > > > +{
> > > > > +       if (obj->funcs->sync)
> > > > > +               return obj->funcs->sync(obj, offset, size, access);
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > +
> > > > >  /**
> > > > >   * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > >   * the lock on an array of GEM objects.
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > >         DRM_GEM_OBJECT_ACTIVE    = BIT(2),
> > > > >  };
> > > > >
> > > > > +/**
> > > > > + * enum drm_gem_object_status - bitmask describing GEM access types 
> > > > > to prepare for
> > > > > + */
> > > > > +enum drm_gem_object_access_flags {
> > > > > +       /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > +       DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. 
> > > > > */
> > > > > +       DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the 
> > > > > entity doing the access. */
> > > > > +       DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only 
> > > > > accesses. */
> > > > > +       DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only 
> > > > > accesses. */
> > > > > +       DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write 
> > > > > accesses. */
> > > > > +       DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > +                                  DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > +
> > > > > +       /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the 
> > > > > access type. */
> > > > > +       DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > > +};
> > > > > +
> > > > >  /**
> > > > >   * struct drm_gem_object_funcs - GEM object functions
> > > > >   */
> > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > >          */
> > > > >         int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct 
> > > > > *vma);
> > > > >
> > > > > +       /**
> > > > > +        * @sync:
> > > > > +        *
> > > > > +        * Prepare for CPU/device access. This can involve migration 
> > > > > of
> > > > > +        * a buffer to the system-RAM/VRAM, or for UMA, 
> > > > > flushing/invalidating
> > > > > +        * the CPU caches. The range can be used to optimize the 
> > > > > synchronization
> > > > > +        * when possible.  
> > > >
> > > > This has gone in a very different direction from the version I sent
> > > > out and the added generality makes me really nervous. The idea of sync
> > > > involving migration and that the range is a mere hint are antithetical
> > > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > > exclusively used by one of the CPU or the GPU at the same time. This
> > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > > dma-buf itself) are littered with such ioctls and we're in the process
> > > > of deleting them all.  
> > >
> > > And yes, I realize I sent this on the patch for the hook which you
> > > intended to plumb through to dma-buf. However, I also saw it being
> > > propagated to an ioctl and I didn't know where else to put it that had
> > > the relevant details.
> > >
> > > ~Faith
> > >  
> > > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > > that needs to happen on map, not on some sort of begin/end. Or better
> > > > yet, just disallow mapping such buffers. Once the client has a map,
> > > > they are free to access from the CPU while stuff is running on the
> > > > GPU. They have to be careful, of course, not to cause data races, but
> > > > accessing the same BO from the CPU and GPU or even the same range is
> > > > totally okay if you aren't racing.
> > > >
> > > > As a corollary, just don't map PRIME buffers.
> > > >
> > > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > > are regularly sub-allocating from larger memory objects. If they ask
> > > > to flush 64B and end up flushing 64M, that's pretty bad.
> > > >
> > > > All we need is something which lets us trap through to the kernel for
> > > > CPU cache management. That's all we need and that's really all it
> > > > should do.  
> >
> > Okay, so there's actually a problem with that I think, because we can't
> > know how the buffer we export will be used. It can be imported by the
> > same driver, and we're all good, but it can also be imported by a
> > different driver, which decides to vmap or allow mmap() on it, and then
> > we have to implement the dma_buf CPU sync hooks. Unless we decide that
> > all exported buffers should be write-combine only? This is the very
> > reason I started hooking things up on the dma_buf side, because we're
> > not in control of who the importer of our buffers is.  
> 
> Exported buffers should be WC-only. We could try to get creative but
> the moment we let the lack of coherency leak to other processes,
> potentially to other drivers, we're in a world of hurt. Even with the
> dma-buf begin/end hooks, if it's imported into a driver that does
> Vulkan, those hooks don't make sense and we're screwed.

Well, yeah, the 'entire-buf' granularity is a problem, indeed, and
there's no way around it with the current dma-buf API, which is why I
prevented external bufs from being mapped (AKA not host-visible in
Vulkan's term). But I really thought imported buffers coming from
panthor should be mapped cached.

> And, yes, I
> know panvk is the only Vulkan implementation you're going to see on a
> system with an Arm GPU, but thinking about things in the general case
> across all of DRM, exporting non-coherent memory in 2025 is just
> cursed.

Hm, okay. If that's the way to go, then we should enforce

        !WB_MMAP || !PRIVATE

in panthor, and fail the export of a WB_MMAP buffer in panfrost (we
don't have a way to know that a buffer is private there).

I'll go and revisit the patchset to do that. I guess PVR needs fixing
too, because I haven't seen anything to prevent `CACHED + EXPORTABLE`
there.

Reply via email to