Servers with two controllers. The second one disappear (with a kernel
trace).

> cat /proc/version
Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 
(Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016

After upgrading kernel, my ZFS pool becomes DEGRADED:
> zpool status
  pool: zp0
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        zp0                      DEGRADED     0     0     0
          mirror-0               DEGRADED     0     0     0
            nvme0n1              ONLINE       0     0     0
            9486952355712335023  UNAVAIL      0     0     0  was /dev/nvme1n1


Only ONE controller listed: !!

> nvme list
Node             SN                   Model                                    
Version  Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- 
-------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     CVMD4391006B800GGN   INTEL SSDPE2ME800G4                      
1.0      1         800,17  GB / 800,17  GB    512   B +  0 B   8DV10102

The bug isn't fixed for me.

[   68.950042] nvme 0000:82:00.0: I/O 0 QID 0 timeout, disable controller
[   69.054149] nvme 0000:82:00.0: Cancelling I/O 0 QID 0
[   69.054182] nvme 0000:82:00.0: Identify Controller failed (-4)
[   69.060132] nvme 0000:82:00.0: Removing after probe failure
[   69.060284] iounmap: bad address ffffc9000cf34000
[   69.065020] CPU: 14 PID: 247 Comm: kworker/14:1 Tainted: P           OE   
4.4.0-47-generic #68-Ubuntu
[   69.065034] Hardware name: Supermicro SYS-F618R2-RC1+/X10DRFR-N, BIOS 2.0 
01/27/2016
[   69.065040] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[   69.065050]  0000000000000286 00000000e10d6171 ffff8820340efce0 
ffffffff813f5aa3
[   69.065052]  ffff88203454b4f0 ffffc9000cf34000 ffff8820340efd00 
ffffffff8106bdff
[   69.065054]  ffff88203454b4f0 ffff88203454b658 ffff8820340efd10 
ffffffff8106be3c
[   69.065056] Call Trace:
[   69.065068]  [<ffffffff813f5aa3>] dump_stack+0x63/0x90
[   69.065089]  [<ffffffff8106bdff>] iounmap.part.1+0x7f/0x90
[   69.065093]  [<ffffffff8106be3c>] iounmap+0x2c/0x30
[   69.065097]  [<ffffffffc01c364a>] nvme_dev_unmap.isra.35+0x1a/0x30 [nvme]
[   69.065099]  [<ffffffffc01c475e>] nvme_remove+0xce/0xe0 [nvme]
[   69.065108]  [<ffffffff81447009>] pci_device_remove+0x39/0xc0
[   69.065117]  [<ffffffff815585e1>] __device_release_driver+0xa1/0x150
[   69.065119]  [<ffffffff815586b3>] device_release_driver+0x23/0x30
[   69.065123]  [<ffffffff8143fa7a>] pci_stop_bus_device+0x8a/0xa0
[   69.065125]  [<ffffffff8143fbca>] 
pci_stop_and_remove_bus_device_locked+0x1a/0x30
[   69.065129]  [<ffffffffc01c309c>] nvme_remove_dead_ctrl_work+0x3c/0x50 [nvme]
[   69.065136]  [<ffffffff8109a4a5>] process_one_work+0x165/0x480
[   69.065138]  [<ffffffff8109a80b>] worker_thread+0x4b/0x4c0
[   69.065141]  [<ffffffff8109a7c0>] ? process_one_work+0x480/0x480
[   69.065143]  [<ffffffff8109a7c0>] ? process_one_work+0x480/0x480
[   69.065147]  [<ffffffff810a09e8>] kthread+0xd8/0xf0
[   69.065150]  [<ffffffff810a0910>] ? kthread_create_on_node+0x1e0/0x1e0
[   69.065157]  [<ffffffff8183538f>] ret_from_fork+0x3f/0x70
[   69.065158]  [<ffffffff810a0910>] ? kthread_create_on_node+0x1e0/0x1e0
[   69.065161] Trying to free nonexistent resource 
<00000000fbd10000-00000000fbd13fff>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1626894

Title:
  nvme drive probe failure

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  After upgrading from linux-image-4.4.0-38-generic to proposed update
  linux-image-4.4.0-39-generic, NVMe drives are no longer working. dmesg
  shows a probe failure.

  On the previous kernel version everything is working as expected.
  ----------------->%-----------------
  [    1.005243] Hardware name: FUJITSU D3417-B1/D3417-B1, BIOS V5.0.0.11 
R1.12.0.SR.2 for D3417-B1x               04/01/2016
  [    1.005349] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
  [    1.005484]  0000000000000286 00000000b6c91251 ffff880fe6e8bce0 
ffffffff813f1f83
  [    1.005800]  ffff880fe02150f0 ffffc90006a7c000 ffff880fe6e8bd00 
ffffffff8106bdff
  [    1.006117]  ffff880fe02150f0 ffff880fe0215258 ffff880fe6e8bd10 
ffffffff8106be3c
  [    1.006433] Call Trace:
  [    1.006509]  [<ffffffff813f1f83>] dump_stack+0x63/0x90
  [    1.006589]  [<ffffffff8106bdff>] iounmap.part.1+0x7f/0x90
  [    1.006668]  [<ffffffff8106be3c>] iounmap+0x2c/0x30
  [    1.006770]  [<ffffffffc007a64a>] nvme_dev_unmap.isra.35+0x1a/0x30 [nvme]
  [    1.007048]  [<ffffffffc007b75e>] nvme_remove+0xce/0xe0 [nvme]
  [    1.007140]  [<ffffffff81443409>] pci_device_remove+0x39/0xc0
  [    1.007220]  [<ffffffff815549f1>] __device_release_driver+0xa1/0x150
  [    1.007301]  [<ffffffff81554ac3>] device_release_driver+0x23/0x30
  [    1.007382]  [<ffffffff8143be7a>] pci_stop_bus_device+0x8a/0xa0
  [    1.007462]  [<ffffffff8143bfca>] 
pci_stop_and_remove_bus_device_locked+0x1a/0x30
  [    1.007559]  [<ffffffffc007a09c>] nvme_remove_dead_ctrl_work+0x3c/0x50 
[nvme]
  [    1.007642]  [<ffffffff8109a3e5>] process_one_work+0x165/0x480
  [    1.007722]  [<ffffffff8109a74b>] worker_thread+0x4b/0x4c0
  [    1.007801]  [<ffffffff8109a700>] ? process_one_work+0x480/0x480
  [    1.007881]  [<ffffffff810a0928>] kthread+0xd8/0xf0
  [    1.007959]  [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
  [    1.008041]  [<ffffffff81831a8f>] ret_from_fork+0x3f/0x70
  [    1.008120]  [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
  [    1.008222] Trying to free nonexistent resource 
<00000000f7100000-00000000f7103fff>
  [    1.008276] genirq: Flags mismatch irq 0. 00000080 (nvme1q0) vs. 00015a00 
(timer)
  [    1.008281] Trying to free nonexistent resource 
<000000000000d000-000000000000d0ff>
  [    1.008282] nvme 0000:02:00.0: Removing after probe failure
  [    1.008645] Trying to free nonexistent resource 
<000000000000e000-000000000000e0ff>
  [    1.027213] iounmap: bad address ffffc90006ae0000
  [    1.027456] CPU: 2 PID: 86 Comm: kworker/2:1 Not tainted 4.4.0-39-generic 
#59-Ubuntu
  -----------------%<-----------------

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1626894/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to