Hi,

On Thu, 2026-06-18 at 17:56 +0200, Philipp Stanner wrote:
> +Cc Danilo
> 
> On Thu, 2026-06-18 at 15:03 +0100, André Draszik wrote:
> > Since commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > I'm seeing the BUG_ON() triggering in drm_crtc's fence_to_crtc() via
> > drm_crtc_fence_get_driver_name() regularly:
> > 
> >     Call trace:
> >      panic+0x58/0x5c
> >      die+0x160/0x178
> >      bug_brk_handler+0x70/0xa4
> >      call_el1_break_hook+0x3c/0x1a0
> >      do_el1_brk64+0x24/0x74
> >      el1_brk64+0x34/0x54
> >      el1h_64_sync_handler+0x80/0xfc
> >      el1h_64_sync+0x84/0x88
> >      drm_crtc_fence_get_driver_name+0x60/0x68 (P)
> >      sync_file_get_name+0x184/0x45c
> >      sync_file_ioctl+0x404/0xf70
> >      __arm64_sys_ioctl+0x124/0x1dc
> > 
> > This looks to be caused by a code flow similar to the following:
> > 
> > +++ snip +++
> > thread A                             thread B
> > 
> >                                      ioctl(SYNC_IOC_FILE_INFO)
> >                                      sync_file_ioctl()
> >                                      sync_file_get_name()
> > dma_fence_signal_timestamp_locked()  dma_fence_driver_name()
> >                                        ops = rcu_dereference(fence->ops)
> >                                        if (!dma_fence_test_signaled_flag())
> >                                          ops->get_driver_name(fence) i.e.
> >                                          drm_crtc_fence_get_driver_name()
> > test_and_set_bit(SIGNALED)
> > RCU_INIT_POINTER(fence->ops, NULL)
> >                                      drm_crtc_fence_get_driver_name()
> >                                        BUG_ON(rcu_access_pointer(fence->ops)
> >                                               != &drm_crtc_fence_ops)
> 
> Now this looks like a very similar problem that I have recently been
> concerned with:
> 
> https://lore.kernel.org/dri-devel/[email protected]/
> 
> https://lore.kernel.org/dri-devel/[email protected]/
> 
> 
> I continue to believe because of bugs like this and the ones I have
> quoted in the threads above the robustness of the kernel could be
> greatly improved if we could get dma_fence fully synchronized with its
> lock.

On top of that, sashiko highlighted  (via my other patch) that the existing
code is missing some memory barriers:

https://sashiko.dev/#/patchset/[email protected]?part=1

I believe Lock synchronization would resolve that (as would adding explicit
memory barriers).

[...]

> > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> > index 63ead8ba6756..31c8636e7467 100644
> > --- a/drivers/gpu/drm/drm_crtc.c
> > +++ b/drivers/gpu/drm/drm_crtc.c
> > @@ -73,6 +73,9 @@
> >   * &drm_mode_config_funcs.atomic_check.
> >   */
> >  
> > +#define fence_to_crtc(f) container_of((f)->extern_lock, \
> > +                                 struct drm_crtc, fence_lock)
> 
> I agree that macros should be avoided if possible.

No problem, I'll change that.

> 
> > +
> >  /**
> >   * drm_crtc_from_index - find the registered CRTC at an index
> >   * @dev: DRM device
> > @@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc)
> >  #endif
> >  }
> >  
> > -static const struct dma_fence_ops drm_crtc_fence_ops;
> > -
> > -static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
> > -{
> > -   BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);
> 
> +1
> 
> BUG_ON is more or less deprecated and should not be used anymore. There
> needs to be bombastic justification for shooting down the entire
> kernel.

Yes, I meant to mention that in my commit message as well.

Now, you might have seen that sashiko highlighted that the existing
BUG_ON() is masking a potential use-after-free during driver removal which
I believe to be a correct observation. It suggests two alternative ways to
resolve it:

https://sashiko.dev/#/patchset/[email protected]?part=1

> Does the CRTC or DRM device need to be kept alive for the RCU grace period,
> or should the fence hold a proper reference to prevent the use-after-free
> when get_driver_name() and get_timeline_name() access the freed CRTC
> structure?

Do you guys have any preference on that? It appears the use-after-free
should be resolved before merging the removal of the BUG_ON(), and I'd like
to progress on this.

Cheers,
Andre'

Reply via email to