On Sun, Jul 24, 2022 at 12:09:50PM +0100, Chris Narkiewicz wrote: > Hi, > > Some time ago I posted a bug on bugs@ (subject: X11 hangs on StarLabs Mk IV - > snapshot 06-06-2022). > We did some checks with help of one of the developers, but we could not find > a cause then. > > I started digging deeper and I think I found a cause of the lockup, but I'm > struggling to > get to the bottom of it. > > All the source code with my patches are on my github. Apologies if somebody > doesn't do > github - I could not find a better way to share this messy set of changes: > > https://github.com/ezaquarii/xenocara/tree/bug > https://github.com/ezaquarii/xenocara/commit/1d6e50bf668adfc07a4da0860d6c8f738ec1228a > > https://github.com/ezaquarii/src/tree/bug > https://github.com/ezaquarii/src/commit/0047e0f206896aa5287cad250c6bee1c994cdf88 > > Playing with a debugger first, I managed to find that it locks in ioctl() > originating from _mesa_MapBuffer and ending up in libiris. I put printfs to > demonstrate this and also the output in attachment.
to see ioctls you can use ktrace/kdump > > Then, I instrumented kernel with some more printfs and I located a > place in DRM code where the task is awaiting wakeup - infintely - inside > drm_syncobj_array_wait_timeout. Due to a large number of printfs line > locations > do not make much sense, so here is the exact location in my git source tree: > > https://github.com/ezaquarii/src/commit/0047e0f206896aa5287cad250c6bee1c994cdf88#r79272404 > > The timeout value sent via ioctl() is "inifinite", so it hangs there > infintely. > I also did a nasty experiment by overriding it to some large but finite > number here: > https://github.com/ezaquarii/src/commit/0047e0f206896aa5287cad250c6bee1c994cdf88#r79273004 what value do you mean by infinite? for linux's MAX_SCHEDULE_TIMEOUT we use INT32_MAX (0x7fffffff) tsleep(9) with a timo of 0 is how a process sleeps without a timeout drm_syncobj_array_wait_timeout() calls schedule_timeout() which calls sleep_setup() with timo 0 if the argument is MAX_SCHEDULE_TIMEOUT. A sleep without a timeout itself is not a problem as a wakeup would come from another part of the kernel. It may be interesting to see what the i965 Mesa driver does. You can force it to load with MESA_LOADER_DRIVER_OVERRIDE=i965 in your environment or try move away /usr/X11R6/lib/modules/dri/iris_dri.so