Bug#1091893: linux-image-6.1.0-28-amd64: Watchdog detected hard LOCKUP on CPU 8, then CPU 0

Neal P. Murphy Thu, 09 Jan 2025 01:30:15 -0800

On Wed, 8 Jan 2025 22:15:48 +0100
Uwe Kleine-König <u.kleine-koe...@baylibre.com> wrote:


> Hello Neal,
> 
> On Wed, Jan 01, 2025 at 11:18:37PM -0500, Neal Murphy wrote:
> > Package: src:linux
> > Version: 6.1.119-1
> > Severity: critical
> > Justification: breaks the whole system
> > 
> > Dear Maintainer,
> > 
> > I plugged in my SSK NVME-to-USB3 adapter. I mounted it, checked it (without
> > writing anything), and unmounted it. The system displayed the '... has data 
> > to
> > be written ...' msg for quite a while. Around then, the system displayed the
> > watchdog error on CPU 8. Shortly after, it displayed a watchdog error on 
> > CPU 0
> > and the system became unresponsive requiring a hard reset.
> > 
> > When I got the SSK, it worked well on the desktop. Months later, I had 
> > problems
> > with it, but didn't get any kernel oopses. The drive works OK on my Asus
> > laptop, so I'm beginning to suspect my desktop's hardware.
> > 
> > I'm reporting this because flaky hardware usually shouldn't cause a system
> > lockup.  
> 
> This isn't only half of the truth. In an ideal world it would be true,
> but in reality this often doesn't work.
> 
> There is another bugreport that looks quite similar to yours:
> https://lore.kernel.org/all/bug-219532-208...@https.bugzilla.kernel.org%2F/.
> The currently last message in that thread (from Dec 1, 22:07) has a
> patch. It would be great if you could test that and report upstream.
> 
> Best regards
> Uwe

Hmmm. It's definitely a hardware (mainboard) issue of some kind.

Running Linux 6.11.5 from backports.
------------------------------
The device works fine plugged into a USB3.2 port in the back of the computer. 
It will mount and umount rapidly many times. I can read many GiB of data from 
it. I can write 10 GiB of data to it. I can let it sit idle for some minutes. 
No errors appear in syslog.

Plugged into one of the front USB3 ports, it works fine. For about a minute. 
Then the system produces variations of the following:
----
2025-01-09T03:20:26.514887-05:00 playground kernel: [596625.269156] sd 9:0:0:0: 
[sdd] tag#18 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
2025-01-09T03:20:26.514900-05:00 playground kernel: [596625.269479] sd 9:0:0:0: 
[sdd] tag#18 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
----
and more errors, finally unmounting and disconnecting the drive. The errors 
occur whether or not I do anything with the drive (read, mount, read-write 
files, unmount, etc.)

If I plug the drive into a front port and do nothing with it, the errors occur 
after about 30 seconds.

Importantly, the system does *not* hang/crash when running 6.11.5; the errors 
are handled well.

Linux 6.1.0
-----------
As for Bookworm's 6.1 kernel, while I might have better luck patching/building 
the 6.1.0-28 kernel (trying to build 6.11 from backports was a Borg-ish 
experience), I would gladly run an xhci module patched/built by someone 
familiar with the Debian build methodology; it is alien territory for me. 
(Well, provided that the patch noted above is easily applied to 6.1.) If it has 
lots of debugging built in, even better.

Thanks,
Neal

Bug#1091893: linux-image-6.1.0-28-amd64: Watchdog detected hard LOCKUP on CPU 8, then CPU 0

Reply via email to