On 11/22/23, Zenaan Harkness <zen...@gmail.com> wrote:
> On 11/21/23, Michael Kjörling <2695bd53d...@ewoof.net> wrote:
>> On 21 Nov 2023 14:48 +1100, from zen...@gmail.com (Zenaan Harkness):
>>> The desktop displays, but my external HDDs have been put to sleep, and
>>> they do not wake up.
>>>
>>> One of them is zfs. The zfs mounts list shows, but any attempt to
>>> view/ls a zfs mount, just hangs permanently until a reboot.
>>>
>>> The other drive is an ext4 filesystem, and it has been completely
>>> un-mounted and the HDD spun down, and it does not spin up again -
>>> until a reboot.
>>
>> This doesn't sound right.
>>
>> Can you run hdparm -C on the affected devices at the time? What is the
>> result of that?
>
> So it seems I can test this quickly with a manual suspend, then do the
> various checks... it seems that the issue here is the
> auto-sleep/suspend.
>
> For starters, prior to suspend, I've removed the zfs drive, and just
> left the ext4 drive in the USB caddy (it holds up to 2 drives).
>
> Prior to suspend, I get, for the 2.5 inch hdd when it has not been
> accessed for a while and I can feel it is not spinning:
>
> # hdparm -C /dev/sda
> /dev/sda:
>  drive state is:  standby
>
> then, I ls'ed a dir in that drive that had not previously been
> accessed, and could feel it spin up and then give me the output, and
> then I ran hdparm again and interestingly, checking a few times on the
> now spun up drive, I get identical results as with the drive in the
> spun down state:
>
> # hdparm -C /dev/sda
> /dev/sda:
>  drive state is:  standby
>
> ----
> Now, after suspend (and wait for hdd to spin down, and wait for
> monitors to blank, and wait another 10s) and finally wake the computer
> up (which is really too slow - 20 or 30 seconds or so, so something
> odd or challenging seems to be happening inside the kernel somewhere):
>
> # ll /dev/sd*
> ls: cannot access '/dev/sd*': No such file or directory
>
> # hdparm -C /dev/sda
> /dev/sda: No such file or directory
>
>
>> Do the drives spin back up if you use hdparm -z?
>
> Prior to suspend and wake, I get this:
>
> # hdparm -z /dev/sda
> /dev/sda:
>  re-reading partition table
>  BLKRRPART failed: Device or resource busy
>
> And again, after suspend and wake there is no more /dev/sda, or any
> /dev/sd*, so I cannot run hdparm on any such device.
>
>
>> What is the exact kernel version you are running? Please provide both
>> the package name and exact package version, and the full output from
>> uname -a.
>
> # uname -a
> Linux zen-L7 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1
> (2023-09-29) x86_64 GNU/Linux
>
> The kernel package is exactly
> linux-image-6.1.0-13-amd64
>
>
>> Assuming that those drives are connected over USB, do they show up in
>> lsusb output while inaccessible?
>
> Prior to suspend and wake, lsusb shows me my hubs, dock, eth adaptors,
> trackball, and possibly the following is the HDD dock ? dunno:
>
> Bus 006 Device 015: ID 152d:0565 JMicron Technology Corp. / JMicron
> USA Technology Corp. JMS56x Series
>
> ... and sure enough after suspend and wake, Bus 006 Device 015 is
> gone, no longer exists, so it somehow has not woken up - but I CAN
> still see the blue light on the hdd caddy, but the hdd remains in a
> spun down/ sleep state, and no /dev/sd* device.


I apologize, the above para was inserted after I did the suspend and
wake cycle, and the following paras were done before that. I apologize
for the confusion, so just be aware the following paras are part of
the "Prior to suspend..." para above.

> I do get these though (alias ll='ls -l'):
>
> # find /dev/disk/|grep usb
> /dev/disk/by-id/usb-WDC_WD20_SPZX-22UA7T0_RANDOM__3F4917AD758C-0:0
> /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0
>
> # ll /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0
> 0 lrwxrwxrwx 1 root root 9 20231122 10:33.10
> /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 ->
> ../../sda
>
> # ll /dev/sd*
> 0 brw-rw---- 1 root disk 8, 0 20231122 10:33.10 /dev/sda
>
> ... interestingly, it seems when I formatted this drive with ext4, I
> formatted ext4 on the whole disk (/dev/sda) without using partitions,
> and so it's just /dev/sda and not /dev/sda1, which has the ext4
> filesystem.
>
>
>> Is there anything relevant in dmesg output?
>
> This looks quite suspicious (some error lines, not all of dmesg output):
>
> [42635.638996] usb 6-2.3.1: device not accepting address 15, error -62
> [42668.986050] usb 6-2.3.1: USB disconnect, device number 15
> [42668.986406] device offline error, dev sda, sector 0 op 0x1:(WRITE)
> flags 0x800 phys_seg 0 prio class 2
> [42668.988647] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71)
> [42668.990867] hub 6-2.3.2.3:1.0: hub_ext_port_status failed (err = -71)
> [42668.990888] hub 6-2.3.2.1:1.0: hub_ext_port_status failed (err = -71)
> [42669.007554] usb 6-2.3.2.3.1: Failed to suspend device, error -71
> [42669.008775] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71)
> 42713.495809] xhci_hcd 0000:3a:00.0: Timeout while waiting for setup
> device command
> [42713.703761] usb 6-2.3.1: device not accepting address 19, error -62
> [42713.704792] usb 6-2.3-port1: unable to enumerate USB device
> [42713.708332] usb 6-2.3.2: USB disconnect, device number 5
> [42713.708343] usb 6-2.3.2.1: USB disconnect, device number 7
>
>
> since "2.3.1" appears in the drive links above, and 6 could be "Bus
> 6". I'm not familiar with dmesg output though...
>
> I also see the following, but that was earlier in the dmesg output and
> may relate to "quitting mpv" causing my desktop/wayland (or it seems,
> gnome shell) hang (a different email thread of mine):
>
> [41107.248174] gnome-shell[45289]: segfault at 0 ip 00007f8bc835f1e0
> sp 00007ffd720f9028 error 4 in
> libmutter-11.so.0.0.0[7f8bc824f000+15a000] likely on CPU 10 (core 4,
> socket 0)
>
>
>> Are you booting the kernel with any command-line parameters? Please
>> provide the exact contents of /proc/cmdline.
>
> Fresh debian stable install, no customization by me at all:
>
> # cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64
> root=UUID=9ce1e519-9712-4616-aeeb-0f858e5ac00a ro quiet
>
>
>> A spun-down drive can take a brief time to spin back up (typically on
>> the order of a few seconds), but that SHOULD be handled automatically;
>> clearly something odd is going on in your case if it doesn't.
>
> Indeed, something odd is up.
>
> So this hdd has, through suspend, disappeared. It's mount is gone,
> it's /dev device is gone. There is a crypted mount which still shows,
> but of course any access gives io error, and umount and cryptsetup
> close cleans that up.
>
> In the past, and I suspect still, with the zfs drive, the problems
> above are impossible to clean up with zfs, e.g run any zfs command
> such as `zfs list` or `zpool list` and the command hangs until reboot,
> and the drive cannot be used again, until a reboot, so the
> suspend/wake cycle is very problematic in this instance, I have to
> permanently disable auto suspend, and remember to not manually
> suspend, if I am in the middle of work on a zfs drive. At least, an
> externally attached zfs drive, but I suspect the same problem is with
> internal drive zfs mounts - zfs is just not properly integrated in
> relation to the linux kernel's suspend and resume, and zfs is not
> currently designed to cope with that... though I DO suspect if some
> sort of loop mount zfs filesystem-in-a-file were mounted only inside a
> virtual machine, and exported back to the host via samba or nfs, and
> the zfs 'filesystem in a file' exists on the host in an ext4
> filesystem, that then zfs may "cope" with suspend and resume, due to
> the cleaner nature of virtual machine "suspend" environment. Something
> to test one day...
>
>
>> “Remember when, on the Internet, nobody cared that you were a dog?”
>
> Bugger! I've been so deluded all this time...
> Oh well, better awake than deluded!
>

Reply via email to