Quoting Tvrtko Ursulin (2018-05-08 18:47:42)
>
> On 08/05/2018 16:35, Chris Wilson wrote:
> > CI noticed
> >
> > <4>[ 23.430701] ============================================
> > <4>[ 23.430706] WARNING: possible recursive locking detected
> > <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted
> > <4>[ 23.430720] --------------------------------------------
> > <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock:
> > <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at:
> > move_to_timeline+0x48/0x12c [i915]
> > <4>[ 23.430888]
> > but task is already holding lock:
> > <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at:
> > i915_request_submit+0x1a/0x40 [i915]
> > <4>[ 23.430995]
> > other info that might help us debug this:
> > <4>[ 23.431002] Possible unsafe locking scenario:
> >
> > <4>[ 23.431007] CPU0
> > <4>[ 23.431010] ----
> > <4>[ 23.431013] lock(&(&timeline->lock)->rlock);
> > <4>[ 23.431021] lock(&(&timeline->lock)->rlock);
> > <4>[ 23.431028]
> > *** DEADLOCK ***
> >
> > <4>[ 23.431036] May be due to missing lock nesting notation
> >
> > <4>[ 23.431044] 5 locks held by systemd-udevd/169:
> > <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at:
> > __driver_attach+0x42/0xe0
> > <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at:
> > __driver_attach+0x50/0xe0
> > <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at:
> > i915_gem_init+0xca/0x630 [i915]
> > <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at:
> > submit_notify+0x35/0x124 [i915]
> > <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....},
> > at: i915_request_submit+0x1a/0x40 [i915]
> > <4>[ 23.431369]
> > stack backtrace:
> > <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted
> > 4.17.0-rc4-CI-CI_DRM_4156+ #1
> > <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280
> > /0G8310, BIOS A04 02/09/2005
> > <4>[ 23.431394] Call Trace:
> > <4>[ 23.431403] dump_stack+0x67/0x9b
> > <4>[ 23.431411] __lock_acquire+0xc67/0x1b50
> > <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0
> > <4>[ 23.431429] ? lock_acquire+0xa6/0x210
> > <4>[ 23.431435] lock_acquire+0xa6/0x210
> > <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915]
> > <4>[ 23.431540] _raw_spin_lock+0x2a/0x40
> > <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915]
> > <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915]
> > <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915]
> > <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915]
> > <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915]
> > <4>[ 23.432120] submit_notify+0x8d/0x124 [i915]
> > <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915]
> > <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915]
> > <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915]
> > <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915]
> > <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0
> > <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915]
> > <4>[ 23.432574] pci_device_probe+0xa1/0x130
> > <4>[ 23.432582] driver_probe_device+0x306/0x480
> > <4>[ 23.432589] __driver_attach+0xb7/0xe0
> > <4>[ 23.432596] ? driver_probe_device+0x480/0x480
> > <4>[ 23.432602] ? driver_probe_device+0x480/0x480
> > <4>[ 23.432609] bus_for_each_dev+0x74/0xc0
> > <4>[ 23.432616] bus_add_driver+0x15f/0x250
> > <4>[ 23.432623] ? 0xffffffffa02d7000
> > <4>[ 23.432629] driver_register+0x52/0xc0
> > <4>[ 23.432635] ? 0xffffffffa02d7000
> > <4>[ 23.432642] do_one_initcall+0x58/0x370
> > <4>[ 23.432653] ? do_init_module+0x1d/0x1ea
> > <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80
> > <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0
> > <4>[ 23.432675] do_init_module+0x56/0x1ea
> > <4>[ 23.432682] load_module+0x2435/0x2b20
> > <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0
> > <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0
> > <4>[ 23.432710] do_syscall_64+0x55/0x190
> > <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > <4>[ 23.432724] RIP: 0033:0x7fa780782839
> > <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000139
> > <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX:
> > 00007fa780782839
> > <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI:
> > 000000000000000e
> > <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09:
> > 00007ffcea73e780
> > <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12:
> > 0000000000000000
> > <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15:
> > 0000561a472a4b30
> >
> > but did not report it as an issue as it only occurred during the first
> > module on boot. This is due to the removal of the distinct global
> > timeline, and its separate lock class. So instead mark up the expected
> > nesting. An alternative would be to define a separate lock class for the
> > engine, but since we only expect to have a single point of nesting, we
> > can avoid having multiple lock classes for the struct.
> >
> > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual
> > timelines")
> > Signed-off-by: Chris Wilson <[email protected]>
> > Cc: Chris Wilson <[email protected]>
> > Cc: Tvrtko Ursulin <[email protected]>
> > Cc: Joonas Lahtinen <[email protected]>
> > ---
> > drivers/gpu/drm/i915/i915_request.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c
> > b/drivers/gpu/drm/i915/i915_request.c
> > index f336942229cf..8928894dd9c7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request
> > *request,
> > GEM_BUG_ON(request->timeline == &request->engine->timeline);
> > lockdep_assert_held(&request->engine->timeline.lock);
> >
> > - spin_lock(&request->timeline->lock);
> > + spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING);
> > list_move_tail(&request->link, &timeline->requests);
> > spin_unlock(&request->timeline->lock);
> > }
> >
>
> Looks like I haven't been paying enough attention!
Nor me, and I've definitely run it through lockdep myself, at least I
think I have! (though admittedly less often than simple kasan runs as I
tend to be more concerned about accessing requests after they are freed).
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx