------- Comment From gbert...@br.ibm.com 2016-08-08 15:09 EDT-------
(In reply to comment #13)
> (In reply to comment #9)
> > I Have Tested this with , Test Kernel available at
> > http://people.canonical.com/~rtg/eeh-lp1602724/ . on Ubuntu 16.04.1
> >
> > Test Kernel :
> > root@everest-lp13-leaf:~# uname -a
> > Linux everest-lp13-leaf 4.4.0-32-generic #51 SMP Tue Jul 19 21:41:04 UTC
> > 2016 ppc64le ppc64le ppc64le GNU/Linux
> >
> >
> >
> > Nvme (Leaf) is getting recovered till 5 times on triggering the EEH, But
> > "hitting a kernel crash" after on 6th time trigger of EEH.
> >
>
> This is most likely fixed by
>
> http://lists.infradead.org/pipermail/linux-nvme/2016-August/005670.html
>   ("[PATCH v2] nvme: Suspend all queues before deletion")
>
> Which is not upstream yet.  Once it gets accepted, it should be pushed to
> Ubuntu via another bugzilla.  When that happens, we'll need a new test
> kernel for this one.

Canonical,

For a little more context, we have identified an issue in the test
kernel you provided. After a sequence of 6 EEHs, DD will attempt to
remove the adapter, which ends up hitting a BUG_ON.

We think the above patch is a fix, but it's still not confirmed.  Can
you provide a kernel with that patch also applied for  verification?
It's still not upstream yet, but it has already been ack-ed by the
driver maintainer, Keith Busch.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1602724

Title:
  Ubuntu 16.04 - Full EEH Recovery Support for NVMe devices

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  == Comment: #0 - Heitor Ricardo Alves de Siqueira <heit...@br.ibm.com> - 
2016-07-12 12:54:27 ==
  Current nvme driver in Ubuntu 16.04 kernel does not handle error recovery; we 
are missing some patches from the upstream nvme driver.

  We would like to ask Canonical to cherry pick the following patches for the 
16.04 kernel, if possible:
      * 9396dec916c0 ("nvme: use a work item to submit async event requests")
      * 79f2b358c9ba ("nvme: don't poll the CQ from the kthread")
      * 2d55cd5f511d ("nvme: replace the kthread with a per-device watchdog 
timer")
      * 9bf2b972afea ("NVMe: Fix reset/remove race")
      * c875a7093f04 ("nvme: Avoid reset work on watchdog timer function during 
error recovery")
      * a5229050b69c ("NVMe: Always use MSI/MSI-x interrupts")

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1602724/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to