Hey Neal,

On Thu, Jan 09, 2025 at 04:27:10AM -0500, Neal P. Murphy wrote:
> On Wed, 8 Jan 2025 22:15:48 +0100
> Uwe Kleine-König <u.kleine-koe...@baylibre.com> wrote:
> 
> > Hello Neal,
> > 
> > On Wed, Jan 01, 2025 at 11:18:37PM -0500, Neal Murphy wrote:
> > > Package: src:linux
> > > Version: 6.1.119-1
> > > Severity: critical
> > > Justification: breaks the whole system
> > > 
> > > Dear Maintainer,
> > > 
> > > I plugged in my SSK NVME-to-USB3 adapter. I mounted it, checked it 
> > > (without
> > > writing anything), and unmounted it. The system displayed the '... has 
> > > data to
> > > be written ...' msg for quite a while. Around then, the system displayed 
> > > the
> > > watchdog error on CPU 8. Shortly after, it displayed a watchdog error on 
> > > CPU 0
> > > and the system became unresponsive requiring a hard reset.
> > > 
> > > When I got the SSK, it worked well on the desktop. Months later, I had 
> > > problems
> > > with it, but didn't get any kernel oopses. The drive works OK on my Asus
> > > laptop, so I'm beginning to suspect my desktop's hardware.
> > > 
> > > I'm reporting this because flaky hardware usually shouldn't cause a system
> > > lockup.  
> > 
> > This isn't only half of the truth. In an ideal world it would be true,
> > but in reality this often doesn't work.
> > 
> > There is another bugreport that looks quite similar to yours:
> > https://lore.kernel.org/all/bug-219532-208...@https.bugzilla.kernel.org%2F/.
> > The currently last message in that thread (from Dec 1, 22:07) has a
> > patch. It would be great if you could test that and report upstream.
> > 
> > Best regards
> > Uwe
> 
> Hmmm. It's definitely a hardware (mainboard) issue of some kind.
> 
> Running Linux 6.11.5 from backports.
> ------------------------------
> The device works fine plugged into a USB3.2 port in the back of the computer. 
> It will mount and umount rapidly many times. I can read many GiB of data from 
> it. I can write 10 GiB of data to it. I can let it sit idle for some minutes. 
> No errors appear in syslog.
> 
> Plugged into one of the front USB3 ports, it works fine. For about a minute. 
> Then the system produces variations of the following:
> ----
> 2025-01-09T03:20:26.514887-05:00 playground kernel: [596625.269156] sd 
> 9:0:0:0: [sdd] tag#18 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
> 2025-01-09T03:20:26.514900-05:00 playground kernel: [596625.269479] sd 
> 9:0:0:0: [sdd] tag#18 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00
> ----
> and more errors, finally unmounting and disconnecting the drive. The errors 
> occur whether or not I do anything with the drive (read, mount, read-write 
> files, unmount, etc.)
> 
> If I plug the drive into a front port and do nothing with it, the errors 
> occur after about 30 seconds.
> 
> Importantly, the system does *not* hang/crash when running 6.11.5; the errors 
> are handled well.

That's good news, thanks for your test.
 
> Linux 6.1.0
> -----------
> As for Bookworm's 6.1 kernel, while I might have better luck
> patching/building the 6.1.0-28 kernel (trying to build 6.11 from
> backports was a Borg-ish experience), I would gladly run an xhci
> module patched/built by someone familiar with the Debian build
> methodology; it is alien territory for me. (Well, provided that the
> patch noted above is easily applied to 6.1.) If it has lots of
> debugging built in, even better.

I tend to not work on fixing 6.1 here. Someone could try to identify the
relevant changes between 6.1 and 6.11, but I guess that's a tidious work
and in the end it's not a single commit that needs backporting but a
whole bunch of commits. (That someone would probably have to be you, as
you have access to that hardware.)

So I suggest you just stick to the backport kernel until Debian 13.

Best regards
Uwe

Attachment: signature.asc
Description: PGP signature

Reply via email to