This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from 
qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for core-imc based on
      num_possible_cpus()

  * Switch Build-Depends: transfig to fig2dev (LP: #1770770)
    - [Config] update Build-Depends: transfig to fig2dev

  * smp_call_function_single/many core hangs with stop4 alone (LP: #1768898)
    - cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
      interrupt

  * Add d-i support for Huawei NICs (LP: #1767490)
    - d-i: add hinic to nic-modules udeb

  * unregister_netdevice: waiting for eth0 to become free. Usage count = 5
    (LP: #1746474)
    - xfrm: reuse uncached_list to track xdsts

  * Include nfp driver in linux-modules (LP: #1768526)
    - [Config] Add nfp.ko to generic inclusion list

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
    - x86/xen: Reset VCPU0 info pointer after shared_info remap

  * CVE-2018-3639 (x86)
    - x86/bugs: Fix the parameters alignment and missing void
    - KVM: SVM: Move spec control call after restore of GS
    - x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
    - x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
    - x86/cpufeatures: Disentangle SSBD enumeration
    - x86/cpufeatures: Add FEATURE_ZEN
    - x86/speculation: Handle HT correctly on AMD
    - x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
    - x86/speculation: Add virtualized speculative store bypass disable support
    - x86/speculation: Rework speculative_store_bypass_update()
    - x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
    - x86/bugs: Expose x86_spec_ctrl_base directly
    - x86/bugs: Remove x86_spec_ctrl_set()
    - x86/bugs: Rework spec_ctrl base and mask logic
    - x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
    - KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
    - x86/bugs: Rename SSBD_NO to SSB_NO
    - bpf: Prevent memory disambiguation attack
    - KVM: VMX: Expose SSBD properly to guests.

  * Suspend to idle: Open lid didn't resume (LP: #1771542)
    - ACPI / PM: Do not reconfigure GPEs for suspend-to-idle

  * Fix initialization failure detection in SDEI for device-tree based systems
    (LP: #1768663)
    - firmware: arm_sdei: Fix return value check in sdei_present_dt()

  * No driver for Huawei network adapters on arm64 (LP: #1769899)
    - net-next/hinic: add arm64 support

  * CVE-2018-1092
    - ext4: fail ext4_iget for root directory if unallocated

  * kernel 4.15 breaks nouveau on Lenovo P50 (LP: #1763189)
    - drm/nouveau: Fix deadlock in nv50_mstm_register_connector()

  * update-initramfs not adding i915 GuC firmware for Kaby Lake, firmware fails
    to load (LP: #1728238)
    - Revert "UBUNTU: SAUCE: (no-up) i915: Remove MODULE_FIRMWARE statements for
      unreleased firmware"

  * Battery drains when laptop is off  (shutdown) (LP: #1745646)
    - PCI / PM: Check device_may_wakeup() in pci_enable_wake()

  * Dell Latitude 5490/5590 BIOS update 1.1.9 causes black screen at boot
    (LP: #1764194)
    - drm/i915/bios: filter out invalid DDC pins from VBT child devices

  * Intel 9462 A370:42A4 doesn't work (LP: #1748853)
    - iwlwifi: add shared clock PHY config flag for some devices
    - iwlwifi: add a bunch of new 9000 PCI IDs

  * Fix an issue that some PCI devices get incorrectly suspended (LP: #1764684)
    - PCI / PM: Always check PME wakeup capability for runtime wakeup support

  * [SRU][Bionic/Artful] fix false positives in W+X checking (LP: #1769696)
    - init: fix false positives in W+X checking

  * Bionic update to v4.15.18 stable release (LP: #1769723)
    - netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to
      ip_set_net_exit()
    - cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
    - rds: MP-RDS may use an invalid c_path
    - slip: Check if rstate is initialized before uncompressing
    - vhost: fix vhost_vq_access_ok() log check
    - l2tp: fix races in tunnel creation
    - l2tp: fix race in duplicate tunnel detection
    - ip_gre: clear feature flags when incompatible o_flags are set
    - vhost: Fix vhost_copy_to_user()
    - lan78xx: Correctly indicate invalid OTP
    - media: v4l2-compat-ioctl32: don't oops on overlay
    - media: v4l: vsp1: Fix header display list status check in continuous mode
    - ipmi: Fix some error cleanup issues
    - parisc: Fix out of array access in match_pci_device()
    - parisc: Fix HPMC handler by increasing size to multiple of 16 bytes
    - Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
    - PCI: hv: Serialize the present and eject work items
    - PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()
    - KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
    - perf/core: Fix use-after-free in uprobe_perf_close()
    - x86/mce/AMD: Get address from already initialized block
    - hwmon: (ina2xx) Fix access to uninitialized mutex
    - ath9k: Protect queue draining by rcu_read_lock()
    - x86/apic: Fix signedness bug in APIC ID validity checks
    - f2fs: fix heap mode to reset it back
    - block: Change a rcu_read_{lock,unlock}_sched() pair into
      rcu_read_{lock,unlock}()
    - nvme: Skip checking heads without namespaces
    - lib: fix stall in __bitmap_parselist()
    - blk-mq: order getting budget and driver tag
    - blk-mq: don't keep offline CPUs mapped to hctx 0
    - ovl: fix lookup with middle layer opaque dir and absolute path redirects
    - xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
    - hugetlbfs: fix bug in pgoff overflow checking
    - nfsd: fix incorrect umasks
    - scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
    - block/loop: fix deadlock after loop_set_status
    - nfit: fix region registration vs block-data-window ranges
    - s390/qdio: don't retry EQBS after CCQ 96
    - s390/qdio: don't merge ERROR output buffers
    - s390/ipl: ensure loadparm valid flag is set
    - get_user_pages_fast(): return -EFAULT on access_ok failure
    - mm/gup_benchmark: handle gup failures
    - getname_kernel() needs to make sure that ->name != ->iname in long case
    - Bluetooth: Fix connection if directed advertising and privacy is used
    - Bluetooth: hci_bcm: Treat Interrupt ACPI resources as always being active-
      low
    - rtl8187: Fix NULL pointer dereference in priv->conf_mutex
    - ovl: set lower layer st_dev only if setting lower st_ino
    - Linux 4.15.18

  * Kernel bug when unplugging Thunderbolt 3 cable, leaves xHCI host controller
    dead (LP: #1768852)
    - xhci: Fix Kernel oops in xhci dbgtty

  * Incorrect blacklist of bcm2835_wdt (LP: #1766052)
    - [Packaging] Fix missing watchdog for Raspberry Pi

  * CVE-2018-8087
    - mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()

  * Integrated Webcam Realtek Integrated_Webcam_HD (0bda:58f4) not working in
    DELL XPS 13 9370 with firmware 1.50 (LP: #1763748)
    - SAUCE: media: uvcvideo: Support realtek's UVC 1.5 device

  * [ALSA] [PATCH] Clevo P950ER ALC1220 Fixup (LP: #1769721)
    - SAUCE: ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup

  * Bionic: Intermittently sent to Emergency Mode on boot with unhandled kernel
    NULL pointer dereference at  0000000000000980 (LP: #1768292)
    - thunderbolt: Prevent crash when ICM firmware is not running

  * linux-snapdragon: reduce EPROBEDEFER noise during boot (LP: #1768761)
    - [Config] snapdragon: DRM_I2C_ADV7511=y

  * regression Aquantia Corp. AQC107 4.15.0-13-generic -> 4.15.0-20-generic ?
    (LP: #1767088)
    - net: aquantia: Regression on reset with 1.x firmware
    - net: aquantia: oops when shutdown on already stopped device

  * e1000e msix interrupts broken in linux-image-4.15.0-15-generic
    (LP: #1764892)
    - e1000e: Remove Other from EIAC

  * Acer Swift sf314-52 power button not managed  (LP: #1766054)
    - SAUCE: platform/x86: acer-wmi: add another KEY_POWER keycode

  * set PINCFG_HEADSET_MIC to parse_flags for Dell precision 3630 (LP: #1766398)
    - ALSA: hda/realtek - set PINCFG_HEADSET_MIC to parse_flags

  * Change the location for one of two front mics on a lenovo thinkcentre
    machine (LP: #1766477)
    - ALSA: hda/realtek - adjust the location of one mic

  * SRU: bionic: apply 50 ZFS upstream bugfixes (LP: #1764690)
    - SAUCE: (noup) Update zfs to 0.7.5-1ubuntu15 (LP: #1764690)

  * [8086:3e92] display becomes blank after S3 (LP: #1763271)
    - drm/i915/edp: Do not do link training fallback or prune modes on EDP

 -- Stefan Bader <stefan.ba...@canonical.com>  Wed, 23 May 2018 18:54:55
+0200

** Changed in: linux (Ubuntu Bionic)
       Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-1092

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-3639

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2018-8087

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1767927

Title:
  ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow
  by CPU ATTEMPT TO RE-ENTER FIRMWARE!

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released

Bug description:
  == Comment: #0 - Application Cdeadmin <cdead...@us.ibm.com> -
  2018-03-20 14:10:53 ==

  
  == Comment: #1 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-03-20 
14:10:54 ==
  == Comment: #2 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-03-20 
14:10:56 ==
  ------- Comment From dougmill-ibm 2018-03-20 13:51:47 EDT -------
  This problem is not tied to a Linux distro. It will be fixed in firmware, as 
I understand it. Let us close any redundant issues for this same problem. Mark 
them as duplicate.

  == Comment: #3 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-03-20 
15:50:54 ==
  ------- Comment From mzipse 2018-03-20 15:44:26 EDT -------
  @stewart-ibm @svaidy , I need to you take a first look.  The stop fixes that 
Vaidy had previously highlighted in a recent note are included in the 3/15 PNOR.

  == Comment: #5 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-04-04 
16:10:56 ==
  ------- Comment From haochanh 2018-04-04 16:04:07 EDT -------
  We update to 0330, bmc=1.18, then we hit bug 1134. Currently we are running 
with disable stop5 but still see the watchdog: hard lockup.
  After 2 hours of test run, I am seeing the "Watchdog: Lockup' and "became 
unstuck"
  ****************************
  [Wed Apr  4 13:38:25 2018] Watchdog CPU:42 Hard LOCKUP
  [Wed Apr  4 13:38:25 2018] Modules linked in: vhost_net vhost macvtap macvlan 
tap xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
rpcsec_gss_krb5 nfsv4 nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) 
ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) esp6_offload esp6 esp4_offload 
esp4 xfrm_algo mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) cxl 
pnv_php mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) devlink 
mlx_compat(OE) kvm_hv kvm binfmt_misc dm_service_time dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua input_leds joydev mac_hid idt_89hpesx ipmi_powernv
  [Wed Apr  4 13:38:25 2018]  vmx_crypto ipmi_devintf at24 ofpart 
uio_pdrv_genirq cmdlinepart uio powernv_flash ipmi_msghandler mtd 
crct10dif_vpmsum opal_prd ibmpowernv nfsd sch_fq_codel auth_rpcgss nfs_acl 
lockd grace sunrpc knem(OE) ip_tables x_tables autofs4 btrfs xor zstd_compress 
raid6_pq ses enclosure scsi_transport_sas hid_generic usbhid hid lpfc ast 
i2c_algo_bit ttm drm_kms_helper nvmet_fc syscopyarea sysfillrect nvmet 
sysimgblt fb_sys_fops nvme_fc nvme_fabrics crc32c_vpmsum drm i40e 
scsi_transport_fc aacraid [last unloaded: mlxfw]
  [Wed Apr  4 13:38:25 2018] CPU: 42 PID: 0 Comm: swapper/42 Tainted: G         
  OE    4.15.0-12-generic #13
  [Wed Apr  4 13:38:25 2018] NIP:  c0000000000a3ca4 LR: c0000000000a3ca4 CTR: 
c000000000008000
  [Wed Apr  4 13:38:25 2018] REGS: c000000ff596fc40 TRAP: 0100   Tainted: G     
      OE     (4.15.0-12-generic)
  [Wed Apr  4 13:38:25 2018] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 
24004482  XER: 20040000
  [Wed Apr  4 13:38:25 2018] CFAR: c000000ff596fda0 SOFTE: 42
                             GPR00: c0000000000a3ca4 c000000ff596fda0 
c0000000016eb200 c000000ff596fc40
                             GPR04: b000000000001033 c0000000000a3690 
0000000024004484 0000000ffa450000
                             GPR08: 0000000000000001 c000000000d10ed8 
00000000000000ff 0000000000000000
                             GPR12: 9000000000121033 c000000007a3ce00 
c000000ff596ff90 0000000000000000
                             GPR16: 0000000000000000 c000000000047840 
c000000000047810 c0000000011b5380
                             GPR20: 0000000000000800 c000000001722484 
000000000000002a 0000000000000000
                             GPR24: 00000000000000a8 0000000000000007 
0000000000000000 0000000000000007
                             GPR28: c00000000161d270 c000000ffb666fd8 
c00000000161d528 0000000000000007
  [Wed Apr  4 13:38:25 2018] NIP [c0000000000a3ca4] power9_idle_type+0x24/0x40
  [Wed Apr  4 13:38:25 2018] LR [c0000000000a3ca4] power9_idle_type+0x24/0x40
  [Wed Apr  4 13:38:25 2018] Call Trace:
  [Wed Apr  4 13:38:25 2018] [c000000ff596fda0] [c0000000000a3ca4] 
power9_idle_type+0x24/0x40 (unreliable)
  [Wed Apr  4 13:38:25 2018] [c000000ff596fdc0] [c000000000ad1240] 
stop_loop+0x40/0x5c
  [Wed Apr  4 13:38:25 2018] [c000000ff596fdf0] [c000000000acd9a4] 
cpuidle_enter_state+0xa4/0x450
  [Wed Apr  4 13:38:25 2018] [c000000ff596fe50] [c00000000017195c] 
call_cpuidle+0x4c/0x90
  [Wed Apr  4 13:38:25 2018] [c000000ff596fe70] [c000000000171d70] 
do_idle+0x2b0/0x330
  [Wed Apr  4 13:38:25 2018] [c000000ff596fec0] [c000000000172028] 
cpu_startup_entry+0x38/0x50
  [Wed Apr  4 13:38:25 2018] [c000000ff596fef0] [c000000000049c30] 
start_secondary+0x4f0/0x510
  [Wed Apr  4 13:38:25 2018] [c000000ff596ff90] [c00000000000aa6c] 
start_secondary_prolog+0x10/0x14
  [Wed Apr  4 13:38:25 2018] Instruction dump:
  [Wed Apr  4 13:38:25 2018] ebe1fff8 7c0803a6 4e800020 3c4c0164 38427580 
7c0802a6 60000000 7c0802a6
  [Wed Apr  4 13:38:25 2018] f8010010 f821ffe1 4bfff97d 4bf732d9 <60000000> 
38210020 e8010010 7c0803a6
  [Wed Apr  4 13:38:25 2018] Watchdog CPU:43 Hard LOCKUP
  [Wed Apr  4 13:38:25 2018] Modules linked in: vhost_net vhost macvtap macvlan 
tap xfs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
rpcsec_gss_krb5 nfsv4 nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) 
ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) esp6_offload esp6 esp4_offload 
esp4 xfrm_algo mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) cxl 
pnv_php mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) devlink 
mlx_compat(OE) kvm_hv kvm binfmt_misc dm_service_time dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua input_leds joydev mac_hid idt_89hpesx ipmi_powernv
  [Wed Apr  4 13:38:25 2018]  vmx_crypto ipmi_devintf at24 ofpart 
uio_pdrv_genirq cmdlinepart uio powernv_flash ipmi_msghandler mtd 
crct10dif_vpmsum opal_prd ibmpowernv nfsd sch_fq_codel auth_rpcgss nfs_acl 
lockd grace sunrpc knem(OE) ip_tables x_tables autofs4 btrfs xor zstd_compress 
raid6_pq ses enclosure scsi_transport_sas hid_generic usbhid hid lpfc ast 
i2c_algo_bit ttm drm_kms_helper nvmet_fc syscopyarea sysfillrect nvmet 
sysimgblt fb_sys_fops nvme_fc nvme_fabrics crc32c_vpmsum drm i40e 
scsi_transport_fc aacraid [last unloaded: mlxfw]
  [Wed Apr  4 13:38:25 2018] CPU: 43 PID: 0 Comm: swapper/43 Tainted: G         
  OE    4.15.0-12-generic #13
  [Wed Apr  4 13:38:25 2018] NIP:  c0000000000a3ca4 LR: c0000000000a3ca4 CTR: 
c000000000008000
  [Wed Apr  4 13:38:25 2018] REGS: c000000ff597fc40 TRAP: 0100   Tainted: G     
      OE     (4.15.0-12-generic)
  [Wed Apr  4 13:38:25 2018] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 
24004482  XER: 00000000
  [Wed Apr  4 13:38:25 2018] CFAR: c000000ff597fda0 SOFTE: 43
                             GPR00: c0000000000a3ca4 c000000ff597fda0 
c0000000016eb200 c000000ff597fc40
                             GPR04: b000000000001033 c0000000000a3690 
0000000024004484 ffffffffffffffbf
                             GPR08: 000000000000007f c000000000d10ed8 
00000000000000ff ffffffffffffffdf
                             GPR12: 9000000000121033 c000000007a3d900 
c000000ff597ff90 0000000000000000
                             GPR16: 0000000000000000 c000000000047840 
c000000000047810 c0000000011b5380
                             GPR20: 0000000000000800 c000000001722484 
000000000000002b 0000000000000000
                             GPR24: 00000000000000ac 0000000000000007 
0000000000000000 0000000000000007
                             GPR28: c00000000161d270 c000000ffb6a6fd8 
c00000000161d528 0000000000000007
  [Wed Apr  4 13:38:25 2018] NIP [c0000000000a3ca4] power9_idle_type+0x24/0x40
  [Wed Apr  4 13:38:25 2018] LR [c0000000000a3ca4] power9_idle_type+0x24/0x40
  [Wed Apr  4 13:38:25 2018] Call Trace:
  [Wed Apr  4 13:38:25 2018] [c000000ff597fda0] [c0000000000a3ca4] 
power9_idle_type+0x24/0x40 (unreliable)
  [Wed Apr  4 13:38:25 2018] [c000000ff597fdc0] [c000000000ad1240] 
stop_loop+0x40/0x5c
  [Wed Apr  4 13:38:25 2018] [c000000ff597fdf0] [c000000000acd9a4] 
cpuidle_enter_state+0xa4/0x450
  [Wed Apr  4 13:38:25 2018] [c000000ff597fe50] [c00000000017195c] 
call_cpuidle+0x4c/0x90
  [Wed Apr  4 13:38:25 2018] [c000000ff597fe70] [c000000000171d70] 
do_idle+0x2b0/0x330
  [Wed Apr  4 13:38:25 2018] [c000000ff597fec0] [c000000000172028] 
cpu_startup_entry+0x38/0x50
  [Wed Apr  4 13:38:25 2018] [c000000ff597fef0] [c000000000049c30] 
start_secondary+0x4f0/0x510
  [Wed Apr  4 13:38:25 2018] [c000000ff597ff90] [c00000000000aa6c] 
start_secondary_prolog+0x10/0x14
  [Wed Apr  4 13:38:25 2018] Instruction dump:
  [Wed Apr  4 13:38:25 2018] ebe1fff8 7c0803a6 4e800020 3c4c0164 38427580 
7c0802a6 60000000 7c0802a6
  [Wed Apr  4 13:38:25 2018] f8010010 f821ffe1 4bfff97d 4bf732d9 <60000000> 
38210020 e8010010 7c0803a6
  [Wed Apr  4 13:38:27 2018] Watchdog CPU:42 became unstuck
  [Wed Apr  4 13:38:27 2018] Watchdog CPU:41 became unstuck
  [Wed Apr  4 13:38:27 2018] Watchdog CPU:43 became unstuck

  == Comment: #6 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-04-04 
16:50:56 ==
  ------- Comment From youhour 2018-04-04 16:44:55 EDT -------
  pegas 1.1 seems to fix my problem above by @haochanh.  Upgrade your OS and 
see if that will help.

  == Comment: #7 - Michael Neuling <michael.neul...@au1.ibm.com> - 2018-04-05 
16:30:31 ==
  So we've seen something similar on on other bugs (like 
https://github.com/open-power/boston-openpower/issues/1084#issuecomment-377122303)

  It's looks like we may have taken an RCU stall which causes an NMI
  interrupt to be sent to the stalled CPU. This then interrupts a CPU
  which is in OPAL, which the kernel doesn't do a good job of recovering
  from. There are two patches that can help:

  The first one removes the NMI on RCU stalls here
  
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=47712a921bb781caf69fca9eae43be19968816cb

  The second improves the kernel handling of taking an NMI/sreset inside
  OPAL http://patchwork.ozlabs.org/patch/886688/ (not upstream).

  == Comment: #15 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-04-12 
08:41:01 ==
  ------- Comment From youhour 2018-04-12 08:32:21 EDT -------
  @mikey Do we have commit for the fix yet?

  == Comment: #16 - Gustavo Luiz Ferreira Walbon <gwal...@br.ibm.com> - 
2018-04-12 13:46:51 ==
  All,

  There are three of four from the original patchset that were approved
  on the upstream. Missing the patch '[RFC,4/4] powerpc/xmon: Detect if
  OPAL was interrupted and mark unrecoverable'
  (https://patchwork.ozlabs.org/patch/886691/)

  [ATTENTION] The ubuntu kernel freeze is coming this week.

  
  [1/4] 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/arch?h=next&id=15b4dd7981496f51c5f9262a5e0761e48de6655f

  powerpc/64s: return more carefully from sreset NMI
  System Reset, being an NMI, must return more carefully than other
  interrupts. It has traditionally returned via the nromal return
  from exception path, but that has a number of problems.

  - r13 does not get restored if returning to kernel. This is for
    interrupts which may cause a context switch, which sreset will
    never do. Interrupting OPAL (which uses a different r13) is one
    place where this causes breakage.

  - It may cause several other problems returning to kernel with
    preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time.

  It's safer just to have a simple restore and return, like machine
  check which is the other NMI.

  
  [2/4] 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/arch?h=next&id=d40b6768e45bd9213139b2d91d30c7692b6007b1

  powerpc/64s: sreset panic if there is no debugger or crash dump handlers
  system_reset_exception does most of its own crash handling now,
  invoking the debugger or crash dumps if they are registered. If not,
  then it goes through to die() to print stack traces, and then is
  supposed to panic (according to comments).

  However after die() prints oopses, it does its own handling which
  doesn't allow system_reset_exception to panic (e.g., it may just
  kill the current process). This patch causes sreset exceptions to
  return from die after it prints messages but before acting.

  This also stops die from invoking the debugger on 0x100 crashes.
  system_reset_exception similarly calls the debugger. It had been
  thought this was harmless (because if the debugger was disabled,
  neither call would fire, and if it was enabled the first call
  would return). However in some cases like xmon 'X' command, the
  debugger returns 0, which currently causes it to be entered
  again (first in system_reset_exception, then in die), which is
  confusing.


  [3/4]
  
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/arch?h=next&id=741de617661794246f84a21a02fc5e327bffc9ad

  powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
  opal_nvram_write currently just assumes success if it encounters an
  error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO
  on other errors instead.

  Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus
  RTAS fallbacks")

  == Comment: #17 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-04-12 
17:30:58 ==
  ------- Comment From youhour 2018-04-12 17:30:25 EDT -------
  Stewart mentioned that these patches need to be picked by the distros.

  @bwmashak Ben do you know who from the distros needs to be informed?

  == Comment: #18 - Michael Y. Lim <youh...@us.ibm.com> - 2018-04-13 10:12:40 ==
  Gustavo,  please let us know which kernel version has this patch.  Thank you!

  == Comment: #19 - Gustavo Luiz Ferreira Walbon <gwal...@br.ibm.com> - 
2018-04-13 14:22:06 ==
  (In reply to comment #18)
  > Gustavo,  please let us know which kernel version has this patch.  Thank 
you!

  Hello Michael,

  So, There is no official distro with those patches, they are on upstream yet. 
  I have generated a build with just asked patch set here which it's based on 
the ubuntu kernel v4.15.0-12.13.
  http://pokgsa.ibm.com/gsa/pokgsa/home/g/w/gwalbon/web/public/Bug165882/v2/

  == Comment: #21 - Benjamin W. Mashak <mas...@us.ibm.com> - 2018-04-24 
15:59:09 ==
  Gustavo Luiz Ferreira Walbon, what's the outlook to upstream and close this 
BZ?  Its currently on the must-fix list for upcoming GA in May.

  == Comment: #22 - Gustavo Luiz Ferreira Walbon <gwal...@br.ibm.com> - 
2018-04-25 08:28:01 ==
  (In reply to comment #21)
  > Gustavo Luiz Ferreira Walbon, what's the outlook to upstream and close this
  > BZ?  Its currently on the must-fix list for upcoming GA in May.

  Benjamin,

  This missing patch was a RFC by Nicholas Piggin, as a RFC just the 3
  of 4 patches was judged as relevant and they were added to powerpc
  tree.

  I hope this 3 patches were enough.

  
  == Comment: #24 - Gustavo Luiz Ferreira Walbon <gwal...@br.ibm.com> - 
2018-04-25 08:34:35 ==
  Adding a patch series to fix a CPU lockup in UbuntuKVM 18.04.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1767927/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to