Am 2025-12-22 17:58, schrieb Warner Losh:
On Sun, Dec 21, 2025 at 8:37 AM Alexander Leidinger <[email protected]> wrote:Am 2025-12-14 14:05, schrieb Warner Losh:Let's do one issue at a time. There's too much missing info. Top posting since there's not a lot of context to this requestThe disk died now completely, so the CRC errors are out of reach now.First, let's start with pciconf -l of the nvme drive. I have a strong idea, but need some data.While already provided privately with some other data, here for the public so that people are aware that currently there is an issue with such drives: nvme0@pci0:5:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa809 subvendor=0x144d subdevice=0xa801Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V
Yea, so far this is the only report I've received, and there's not enough data in it to reproduce it with any of the dozen NVMe drives that I have, or to spot a difference with what I know I check in the code. So if it's compiled into the kernel with cam also compiled into the kernel, I know it works.
CAM is in the kerne, nvme is loaded as a module (from 15-current): ---snip--- # kldstat | egrep '(nvm|cam)' 2 1 0xffffffff811e3000 20db8 nvme.ko ---snip---I will do a clean rebuild with the most recent 16-current and provide a full dmesg if this still doesn't work.
Bye, Alexander.
Warner Bye, Alexander.Also, the disk report needs full logs with and without the settings that have uncorrectable in them. I'd expect that a shorter timeout would lead to different behavior, but maybe that error syndrome isn't one I've seen. It would also be helpful to know which of the times changes the behavior...WarnerOn Sun, Dec 14, 2025, 5:06 AM Alexander Leidinger <[email protected]> wrote: Hi Warner,I try to update a 15-current (as of 2025-11-27-110715) to a recent 16 (as of 2025-12-13-132815). It fails to import a pool due to a missingnvme. I also have a broken HD in this system... to be on the safe side Imention it. This is from 15-current: ---snip--- NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 diskid/DISK-WD-WCC4N4KLEZT7p3 ONLINE 0 0 0 diskid/DISK-WD-WCC4N1DF9DA2p3 ONLINE 0 0 0 diskid/DISK-WD-WX52D625R0NTp3 ONLINE 0 0 0 diskid/DISK-WD-WCC4N1PYJ3F8p3 OFFLINE 0 0 0 logs diskid/DISK-493504058890547p1 ONLINE 0 0 0 cache diskid/DISK-493504058890547p2 ONLINE 0 0 0 NAME STATE READ WRITE CKSUM space DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 diskid/DISK-WD-WCC4N4KLEZT7p4 ONLINE 0 0 0 diskid/DISK-WD-WCC4N1DF9DA2p4 ONLINE 0 0 0 diskid/DISK-WD-WX52D625R0NTp4 ONLINE 0 0 0 diskid/DISK-WD-WX52D625R2TPp4 ONLINE 0 0 0 diskid/DISK-WD-WCC4N1PYJ3F8p4 OFFLINE 0 0 0 logs diskid/DISK-S649NL0T819360Vp2 ONLINE 0 0 0 cache diskid/DISK-S649NL0T819360Vp3 ONLINE 0 0 0 ---snip--- The offline marked partitions are on the same HD (the broken one). The DISK-S649NL0T819360V device use as log and cache in the second pool causes the issue on 16-current. On 16-current I get "uncorrectable parity/CRC error" messages on boot from the broken disk. I used this to get rid of those errors: ---snip--- # grep kern.cam /tmp/be_mount.MhLw/boot/loader.conf kern.cam.tur_timeout="60" kern.cam.inquiry_timeout="60" kern.cam.modesense_timeout="60" ---snip--- But the second pool ("space") fails to get imported. When I import it via "zpool import -m space" it shows me that the log and cache devices (different partitions on the same hardware) are not available. This is the device in question as seen from 15-current: ---snip--- nda0: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V> nda0: Serial Number S649NL0T819360V [1] nda0: nvme version 1.4 nda0: 953869MB (1953525168 512 byte sectors) [1] GEOM: new disk nda0 ... [1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1 pass6: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V> pass6: Serial Number S649NL0T819360V [1] pass6: nvme version 1.4 ---snip--- In case you need some info from the 15- or 16-current BE, which info do you need? Bye, Alexander. --http://www.Leidinger.net [email protected]: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org [email protected] : PGP 0x8F31830F9F2772BF
-- http://www.Leidinger.net [email protected]: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org [email protected] : PGP 0x8F31830F9F2772BF -- http://www.Leidinger.net [email protected]: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org [email protected] : PGP 0x8F31830F9F2772BF
signature.asc
Description: OpenPGP digital signature
