------- Comment From [email protected] 2016-01-11 10:34 EDT-------
Status update:

The root cause was found, and a patch is provided.
The problem happens when DLPAR of PCI device is done in LPAR with no PCI 
devices present at boot time. When DDW is being enabled (in function 
query_ddw() specifically), a NULL pointer dereference happens because a member 
of struct eeh_dev is NULL.

This is caused because EEH is not initialized correctly, by not probing
PCI devices as expected, and so not initializing the eeh_dev struct.

The commit 89a51df5ab1d ("powerpc/eeh: Fix crash in
eeh_add_device_early() on Cell") added a check to avoid oops in Cell
architecture in function eeh_add_device_early() - this function is used
to probe PCI devices in hotplug/DLPAR operation. The check is performed
by evaluating the return of eeh_enable() function.

The issue then happens because since we have no PCI device on boot time,
EEH is not enabled and this check fails on eeh_add_device_early(). Our
patch changes the way the arch checking is done, and so this bug does
not happen anymore.

The patch was submitted upstream. I don't know exactly the procedure  regarding 
Canonical - I think we should wait the upstream acceptance and then request 
Canonical to add the patch to Ubuntu's 14.04.4/15.10/16.04 kernel.
The patch's description provides a bit more details of the issue and the 
proposed solution.

Link to patch on ppc-dev list: https://lists.ozlabs.org/pipermail
/linuxppc-dev/2016-January/137695.html

Thanks Shryia for all the help provided.
Cheers,

Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1486180

Title:
  Kernel OOPS during DLPAR operation with Fibre Channel adapter

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1486180/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to