------- Comment From ukri...@us.ibm.com 2017-07-27 18:33 EDT-------
I just opened another BZ 157097 for the same issue. I was referred to this bug 
and I see that it addresses the same issue I was debugging. But we need the 
upstream commit be5c5e843c4afa1c8397cb740b6032bd4142f32d pulled into Xenial 
16.04.3 HWE v4.10 kernel also.

Bad commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c is included in
16.04.3 HWE v4.10 kernel, so we need the fixing upstream commit in
Xenial (16.04.3) also if possible. I know we are cutting close to
16.04.3 release date but this is a regression, so it would be good to
have the fixing commit if possible.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1684054

Title:
  [LTCTest][Opal][FW860.20] HMI recoverable errors failed to recover and
  system goes to dump state.

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Zesty:
  New

Bug description:
  == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:08:41 ==
  ---Problem Description---
  HMI Recoverable error injection tests leads to system checkstop followed by 
system dump with ubuntu 17.04 os and kernel 4.10.0-19-generic ppc64le
   
  Contact Information = ppaid...@in.ibm.com 
   
  ---uname output---
  #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV 8284-22A 
   
  ---System Hang---
   System is in dumping state. after dump finishes system will IPL to OS again.
   
  ---Debugger---
  A debugger is not configured
   

  == Comment: #3 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 
06:12:51 ==
  # uname -a
  #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
  # cat /etc/os-release 
  NAME="Ubuntu"
  VERSION="17.04 (Zesty Zapus)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 17.04"
  VERSION_ID="17.04"
  HOME_URL="https://www.ubuntu.com/";
  SUPPORT_URL="https://help.ubuntu.com/";
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
  
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
  VERSION_CODENAME=zesty
  UBUNTU_CODENAME=zesty
  root@p8wookie:~#

  == Comment: #4 - Kevin W. Rudd <ru...@us.ibm.com> - 2017-04-17
  11:10:22 ==

  
  == Comment: #5 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 13:34:03 ==
  it looks like below commit is a culprit:

  =======================================
  commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c
  Author: Nicholas Piggin <npig...@gmail.com>
  Date:   Fri Jan 27 14:24:33 2017 +1000

      powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
      
      The branch from hmi_exception_early to hmi_exception_realmode must use
      a "relocatable-style" branch, because it is branching from unrelocated
      exception code to beyond __end_interrupts.
      
      Signed-off-by: Nicholas Piggin <npig...@gmail.com>
      Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
  =======================================

  With the above commit changes now hmi_exception_realmode() is called
  using bctrl which ends up messing up TOC (r2) value and further access
  using new r2 results into unpredictable behaviour.

  ----------------------------------------
  c000000000025f50 <hmi_exception_realmode>:
  c000000000025f50:       3a 01 4c 3c     addis   r2,r12,314
  c000000000025f54:       b0 01 42 38     addi    r2,r2,432
  c000000000025f58:       a6 02 08 7c     mflr    r0
  -----------------------------------------

  With above commit the hmi_exception_early() code jumps to
  c000000000025f50 (hmi_exception_realmode+0x0)  which then sets up new
  value for r2.

  If we revert above commit the code jumps to c000000000025f58
  (hmi_exception_realmode+0x8) and hmi handler works fine.

  After reverting above patch I don't see this issue anymore. I have
  rebuilt the ubuntu kernel after reverting above patch and you can find
  the kernel rpm at:

  Can you please retry your tests with above kernel and see if issue
  still persists.

  == Comment: #6 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-17 23:02:31 ==
  Spoke to Michael Ellerman this morning. He helped me to identify the root 
cause and a fix patch beow:

  diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
  index 857bf7c5b946..7cfeb8768587 100644
  --- a/arch/powerpc/kernel/exceptions-64s.S
  +++ b/arch/powerpc/kernel/exceptions-64s.S
  @@ -982,7 +982,7 @@ TRAMP_REAL_BEGIN(hmi_exception_early)
        EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
        EXCEPTION_PROLOG_COMMON_3(0xe60)
        addi    r3,r1,STACK_FRAME_OVERHEAD
  -     BRANCH_LINK_TO_FAR(r4, hmi_exception_realmode)
  +     BRANCH_LINK_TO_FAR(r12, hmi_exception_realmode)
        /* Windup the stack. */
        /* Move original HSRR0 and HSRR1 into the respective regs */
        ld      r9,_MSR(r1)

  == Comment: #7 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
  2017-04-18 01:52:03 ==

  
  == Comment: #8 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
01:53:57 ==
  Hi Mahesh
  Tested all the HMI Recoverable errors on the below patched kernel, attached 
the corresponding executing logs. All tests are working fine.

  #21 SMP Mon Apr 17 12:58:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

  
  Thanks

  == Comment: #9 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-18 06:07:56 ==
  (In reply to comment #8)
  > Hi Mahesh
  > Tested all the HMI Recoverable errors on the below patched kernel, attached
  > the corresponding executing logs. All tests are working fine.
  > 
  > Linux p8wookie 4.10.0-19.bz153487-generic #21 SMP Mon Apr 17 12:58:30 EDT
  > 2017 ppc64le ppc64le ppc64le GNU/Linux
  > 
  > 
  > Thanks

  Thanks. Michael has posted fix for this upstream.

  http://patchwork.ozlabs.org/patch/751647/

  I will rebuild the new ubuntu kernel with above patch.

  == Comment: #12 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 
09:27:59 ==
  (In reply to comment #11)
  > > 
  > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
  > 
  > I have built new kernel with above patch and you can find it below path
  > 
  >:/home2/mahesh/u2/bz153487v2/linux-image-4.10.0-19.bz153487v2-
  > generic_4.10.0-19.bz153487v2.21_ppc64el.deb

  
  Tested with this new patched kernel, all tests are working fine.

  Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18
  07:43:13 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

  Will attach is full the execution logs here.

  == Comment: #13 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> -
  2017-04-18 09:29:43 ==

  
  == Comment: #14 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 
2017-04-19 03:52:18 ==
  (In reply to comment #12)
  > (In reply to comment #11)
  > > > 
  > > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032
  > > 

  Thanks for testing. We need to mirror this to ubuntu for fix patch
  inclusion

  > 
  > Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13 EDT
  > 2017 ppc64le ppc64le ppc64le GNU/Linux
  > 
  > Will attach is full the execution logs here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1684054/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to