On 11/22/23, Zenaan Harkness <zen...@gmail.com> wrote: > On 11/21/23, Michael Kjörling <2695bd53d...@ewoof.net> wrote: >> On 21 Nov 2023 14:48 +1100, from zen...@gmail.com (Zenaan Harkness): >>> The desktop displays, but my external HDDs have been put to sleep, and >>> they do not wake up. >>> >>> One of them is zfs. The zfs mounts list shows, but any attempt to >>> view/ls a zfs mount, just hangs permanently until a reboot. >>> >>> The other drive is an ext4 filesystem, and it has been completely >>> un-mounted and the HDD spun down, and it does not spin up again - >>> until a reboot. >> >> This doesn't sound right. >> >> Can you run hdparm -C on the affected devices at the time? What is the >> result of that? > > So it seems I can test this quickly with a manual suspend, then do the > various checks... it seems that the issue here is the > auto-sleep/suspend. > > For starters, prior to suspend, I've removed the zfs drive, and just > left the ext4 drive in the USB caddy (it holds up to 2 drives). > > Prior to suspend, I get, for the 2.5 inch hdd when it has not been > accessed for a while and I can feel it is not spinning: > > # hdparm -C /dev/sda > /dev/sda: > drive state is: standby > > then, I ls'ed a dir in that drive that had not previously been > accessed, and could feel it spin up and then give me the output, and > then I ran hdparm again and interestingly, checking a few times on the > now spun up drive, I get identical results as with the drive in the > spun down state: > > # hdparm -C /dev/sda > /dev/sda: > drive state is: standby > > ---- > Now, after suspend (and wait for hdd to spin down, and wait for > monitors to blank, and wait another 10s) and finally wake the computer > up (which is really too slow - 20 or 30 seconds or so, so something > odd or challenging seems to be happening inside the kernel somewhere): > > # ll /dev/sd* > ls: cannot access '/dev/sd*': No such file or directory > > # hdparm -C /dev/sda > /dev/sda: No such file or directory > > >> Do the drives spin back up if you use hdparm -z? > > Prior to suspend and wake, I get this: > > # hdparm -z /dev/sda > /dev/sda: > re-reading partition table > BLKRRPART failed: Device or resource busy > > And again, after suspend and wake there is no more /dev/sda, or any > /dev/sd*, so I cannot run hdparm on any such device. > > >> What is the exact kernel version you are running? Please provide both >> the package name and exact package version, and the full output from >> uname -a. > > # uname -a > Linux zen-L7 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 > (2023-09-29) x86_64 GNU/Linux > > The kernel package is exactly > linux-image-6.1.0-13-amd64 > > >> Assuming that those drives are connected over USB, do they show up in >> lsusb output while inaccessible? > > Prior to suspend and wake, lsusb shows me my hubs, dock, eth adaptors, > trackball, and possibly the following is the HDD dock ? dunno: > > Bus 006 Device 015: ID 152d:0565 JMicron Technology Corp. / JMicron > USA Technology Corp. JMS56x Series > > ... and sure enough after suspend and wake, Bus 006 Device 015 is > gone, no longer exists, so it somehow has not woken up - but I CAN > still see the blue light on the hdd caddy, but the hdd remains in a > spun down/ sleep state, and no /dev/sd* device.
I apologize, the above para was inserted after I did the suspend and wake cycle, and the following paras were done before that. I apologize for the confusion, so just be aware the following paras are part of the "Prior to suspend..." para above. > I do get these though (alias ll='ls -l'): > > # find /dev/disk/|grep usb > /dev/disk/by-id/usb-WDC_WD20_SPZX-22UA7T0_RANDOM__3F4917AD758C-0:0 > /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 > > # ll /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 > 0 lrwxrwxrwx 1 root root 9 20231122 10:33.10 > /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 -> > ../../sda > > # ll /dev/sd* > 0 brw-rw---- 1 root disk 8, 0 20231122 10:33.10 /dev/sda > > ... interestingly, it seems when I formatted this drive with ext4, I > formatted ext4 on the whole disk (/dev/sda) without using partitions, > and so it's just /dev/sda and not /dev/sda1, which has the ext4 > filesystem. > > >> Is there anything relevant in dmesg output? > > This looks quite suspicious (some error lines, not all of dmesg output): > > [42635.638996] usb 6-2.3.1: device not accepting address 15, error -62 > [42668.986050] usb 6-2.3.1: USB disconnect, device number 15 > [42668.986406] device offline error, dev sda, sector 0 op 0x1:(WRITE) > flags 0x800 phys_seg 0 prio class 2 > [42668.988647] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71) > [42668.990867] hub 6-2.3.2.3:1.0: hub_ext_port_status failed (err = -71) > [42668.990888] hub 6-2.3.2.1:1.0: hub_ext_port_status failed (err = -71) > [42669.007554] usb 6-2.3.2.3.1: Failed to suspend device, error -71 > [42669.008775] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71) > 42713.495809] xhci_hcd 0000:3a:00.0: Timeout while waiting for setup > device command > [42713.703761] usb 6-2.3.1: device not accepting address 19, error -62 > [42713.704792] usb 6-2.3-port1: unable to enumerate USB device > [42713.708332] usb 6-2.3.2: USB disconnect, device number 5 > [42713.708343] usb 6-2.3.2.1: USB disconnect, device number 7 > > > since "2.3.1" appears in the drive links above, and 6 could be "Bus > 6". I'm not familiar with dmesg output though... > > I also see the following, but that was earlier in the dmesg output and > may relate to "quitting mpv" causing my desktop/wayland (or it seems, > gnome shell) hang (a different email thread of mine): > > [41107.248174] gnome-shell[45289]: segfault at 0 ip 00007f8bc835f1e0 > sp 00007ffd720f9028 error 4 in > libmutter-11.so.0.0.0[7f8bc824f000+15a000] likely on CPU 10 (core 4, > socket 0) > > >> Are you booting the kernel with any command-line parameters? Please >> provide the exact contents of /proc/cmdline. > > Fresh debian stable install, no customization by me at all: > > # cat /proc/cmdline > BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64 > root=UUID=9ce1e519-9712-4616-aeeb-0f858e5ac00a ro quiet > > >> A spun-down drive can take a brief time to spin back up (typically on >> the order of a few seconds), but that SHOULD be handled automatically; >> clearly something odd is going on in your case if it doesn't. > > Indeed, something odd is up. > > So this hdd has, through suspend, disappeared. It's mount is gone, > it's /dev device is gone. There is a crypted mount which still shows, > but of course any access gives io error, and umount and cryptsetup > close cleans that up. > > In the past, and I suspect still, with the zfs drive, the problems > above are impossible to clean up with zfs, e.g run any zfs command > such as `zfs list` or `zpool list` and the command hangs until reboot, > and the drive cannot be used again, until a reboot, so the > suspend/wake cycle is very problematic in this instance, I have to > permanently disable auto suspend, and remember to not manually > suspend, if I am in the middle of work on a zfs drive. At least, an > externally attached zfs drive, but I suspect the same problem is with > internal drive zfs mounts - zfs is just not properly integrated in > relation to the linux kernel's suspend and resume, and zfs is not > currently designed to cope with that... though I DO suspect if some > sort of loop mount zfs filesystem-in-a-file were mounted only inside a > virtual machine, and exported back to the host via samba or nfs, and > the zfs 'filesystem in a file' exists on the host in an ext4 > filesystem, that then zfs may "cope" with suspend and resume, due to > the cleaner nature of virtual machine "suspend" environment. Something > to test one day... > > >> “Remember when, on the Internet, nobody cared that you were a dog?” > > Bugger! I've been so deluded all this time... > Oh well, better awake than deluded! >