On Thu, 16 Jan 2020 02:14:16 -0000
dann frazier <dann.fraz...@canonical.com> wrote:

> I built a kernel with the proposed patches[*] and ran a reboot/kernel
> compile test on 4 systems. The tests survived 46 total iterations
> (~12/system) before I interrupted. Two systems failed with "Synchronous
> External Abort: synchronous parity or ECC error" errors.
> 
> I've reverted the systems back to 4.15.0-70 - the kernel before the
> cpufeature/errata patches that caused this - to see if these SEA errors
> are a regression.
> 
> [*] https://lists.ubuntu.com/archives/kernel-
> team/2020-January/106909.html
> 

I've ran 75 iterations of reboot/compile-kernel and encountered 3 gcc
segmentation faults. Unfortunately, my test didn't capture the dmesg log but
it's likely that these are due to the ECC problems we're (still?) seeing.

There was also another issue during one of the reboots which is probably
unrelated and due to a flaky BMC:

[   33.896320] ipmi_ssif 0-0012: IPMI message handler: device id demangle
failed: -22 [   33.896354] ipmi_ssif 0-0012: Unable to get the device id: -5
[   33.987825] ipmi_ssif 0-0012: Found new BMC (man_id: 0x000000, prod_id:
0xaabb, dev_id: 0x20) [   33.987858] Unable to handle kernel read from
unreadable memory at virtual address 00000018 [   33.999300] Mem abort info:
[   34.005475]   ESR = 0x96000004
[   34.011454]   Exception class = DABT (current EL), IL = 32 bits
[   34.020168]   SET = 0, FnV = 0
[   34.025893]   EA = 0, S1PTW = 0
[   34.031617] Data abort info:
[   34.037060]   ISV = 0, ISS = 0x00000004
[   34.043448]   CM = 0, WnR = 0
[   34.048949] user pgtable: 4k pages, 48-bit VAs, pgd = 000000002799ee91
[   34.058063] [0000000000000018] *pgd=0000000000000000
[   34.065624] Internal error: Oops: 96000004 [#1] SMP
[   34.073090] Modules linked in: nls_iso8859_1 sch_fq_codel thunderx_zip
thunderx_edac ib_iser cavium_rng_vf rdma_cm ipmi_ssif(+) ipmi_devintf shpchp
cavium_rng iw_cm ipmi_msghandler ib_cm gpio_keys uio_pdrv_genirq uio ib_core
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4
btrfs zstd_compress raid10 raid456 libcrc32c async_raid6_recov async_memcpy
async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear ast
i2c_algo_bit drm_kms_helper nicvf syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm nicpf drm aes_ce_blk aes_ce_cipher crc32_ce crct10dif_ce ghash_ce sha2_ce
sha256_arm64 sha1_ce ahci thunder_bgx libahci i2c_thunderx thunder_xcv
mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd
cryptd aes_arm64 [   34.161807] Process kworker/64:1 (pid: 651, stack limit =
0x00000000b0697881) [   34.172016] CPU: 64 PID: 651 Comm: kworker/64:1 Not
tainted 4.15.18+ #40 [   34.181723] Hardware name: Cavium ThunderX CRB/To be
filled by O.E.M., BIOS 5.11 12/12/2012 [   34.193113] Workqueue: events
redo_bmc_reg [ipmi_msghandler] [   34.201840] pstate: 80400005 (Nzcv daif +PAN
-UAO) [   34.209589] pc : smi_send.isra.4+0x80/0x158 [ipmi_msghandler] [
34.218275] lr : smi_send.isra.4+0x150/0x158 [ipmi_msghandler] [   34.227046] sp
: ffff0000128c3b10 [   34.233209] x29: ffff0000128c3b10 x28: 0000000000000020 [
  34.241305] x27: 0000000000000002 x26: 0000000000000000 [   34.249437] x25:
ffff0000128c3c40 x24: ffff0000128c3c38 [   34.257455] x23: 0000000000000000
x22: 0000000000000018 [   34.265500] x21: 0000000000000000 x20:
ffff810fb16c8800 [   34.273558] x19: ffff800faffb0000 x18: ffffffffffffffff [
34.281643] x17: 0000000000000005 x16: 0000000000000000 [   34.289758] x15:
ffff000009578c08 x14: ffff810fb0d20187 [   34.297899] x13: ffff810fb0d20186
x12: 0000000000000030 [   34.305997] x11: 0101010101010101 x10:
ffff7f7f7f7f7f7f [   34.314069] x9 : fefdfefefefefeff x8 : ffff810fb16c8800 [
34.322166] x7 : 0000000000001138 x6 : 000000000000125c [   34.330300] x5 :
00000000000000dc x4 : ffff810fbc8f1340 [   34.338456] x3 : 0000000000000000 x2
: 0000000000000000 [   34.346633] x1 : ffff810fb16c8800 x0 : ffff810fae4ff800 [
  34.354839] Call trace: [   34.360207]  smi_send.isra.4+0x80/0x158
[ipmi_msghandler] [   34.368450]  i_ipmi_request+0x2ac/0x980 [ipmi_msghandler]
[   34.376716]  send_channel_info_cmd+0xac/0xd8 [ipmi_msghandler] [
34.385396]  __scan_channels.isra.20+0x84/0x180 [ipmi_msghandler] [   34.394341]
 __bmc_get_device_id+0x424/0x8c8 [ipmi_msghandler] [   34.402994]
redo_bmc_reg+0x6c/0x70 [ipmi_msghandler] [   34.410840]
process_one_work+0x1e0/0x420 [   34.417640]  worker_thread+0x4c/0x478 [
34.420416] IPv6: ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready [
34.424073]  kthread+0x134/0x138 [   34.424081]  ret_from_fork+0x10/0x18 [
34.424089] Code: f908aa74 b4ffff74 f9424e60 aa1403e1 (f94002c2) [   34.454826]
---[ end trace b54ad269f357375f ]--- [   34.467956] ipmi_ssif: Unable to
register device: error -5 [   34.476380] ipmi_ssif 0-0012: Unable to start IPMI
SSIF: -5 [   34.484925] ipmi_ssif: probe of 0-0012 failed with error -5

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857074

Title:
  Cavium ThunderX CN88XX Panic : Unknown reason

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Series: Bionic
  Kernel: 4.15.0-74.84 linux-generic
  Steps to reproduce:  Install 4.15.0-74.84 Kernel and boot the system.

  The following crash was observed while testing the proposed kernel for the 
2019.12.02 SRU Cycle.
  This kernel was built to include fixes for the following bugs:

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX
      (LP: #1853326)
      - Revert "arm64: Use firmware to detect CPUs that are not affected by
        Spectre-v2"
      - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*"

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and
      Kunpeng920 (LP: #1852723)
      - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to
        correct place

  The following crash appears to be a NEW bug. not related to the prior bugs 
listed above.
  This bug DOES NOT APPEAR to be related to LP#1857073.

  This is another NEW BUG.

  Hostname: Starmie

  Probable Cause is unknown at this point and still under investigation.

  [  OK  ] Found device WDC_WD5003ABYZ-011FA0 efi.
           Mounting /boot/efi...
  [  OK  ] Mounted /boot/efi.
  [  OK  ] Reached target Local File Systems.
           Starting AppArmor initialization...
           Starting Tell Plymouth To Write Out Runtime Data...
           Starting ebtables ruleset management...
  [   20.942427] kernel BUG at 
/build/linux-pWET3k/linux-4.15.0/fs/buffer.c:1240!
  [   20.951416] Internal error: Oops - BUG: 0 [#1] SMP
  [   20.958153] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
cavium_rng_vf shpchp cavium_rng gpio_keys uio_pdrv_genirq ipmi_ssif uio 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk 
sysimgblt fb_sys_fops aes_ce_cipher crc32_ce drm crct10dif_ce ghash_ce sha2_ce 
sha256_arm64 sha1_ce ahci thunder_bgx libahci thunder_xcv i2c_thunderx 
mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd 
cryptd aes_arm64
  [   21.044326] Process systemd (pid: 1, stack limit = 0x000000005af6f18b)
  [   21.053858] CPU: 1 PID: 1 Comm: systemd Not tainted 4.15.0-74-generic 
#84-Ubuntu
  [   21.063931] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [   21.074790] pstate: 20400085 (nzCv daIf +PAN -UAO)
  [   21.082096] pc : __find_get_block+0x2e8/0x398
  [   21.088917] lr : __getblk_gfp+0x3c/0x2a8
  [   21.095379] sp : ffff0000099ab7e0
  [   21.101062] x29: ffff0000099ab7e0 x28: 0000000000000000
  [   21.108699] x27: 0000000000000000 x26: 0000000000000000
  [   21.116265] x25: 0000000000000001 x24: 0000000000000000
  [   21.123788] x23: 0000000000000008 x22: ffff801f26116c80
  [   21.131302] x21: ffff801f26116c80 x20: 000000000000245c
  [   21.138808] x19: 0000000000001000 x18: 0000ffffa59c3a70
  [   21.146300] x17: 0000000000000000 x16: 0000000000000000
  [   21.153730] x15: 0000000000000020 x14: 0000000000000012
  [   21.161083] x13: 2f7374696e752f64 x12: 0101010101010101
  [   21.168397] x11: 7f7f7f7f7f7f7f7f x10: ffff00000972d000
  [   21.175689] x9 : 0000000000000000 x8 : ffff801f7ba7e3c0
  [   21.183042] x7 : ffff801f7ba7e3e0 x6 : 0000000000000000
  [   21.190667] x5 : 0000000000000004 x4 : 0000000000000020
  [   21.197955] x3 : 0000000000000008 x2 : 0000000000001000
  [   21.205680] x1 : 000000000000245c x0 : 0000000000000080
  [   21.212918] Call trace:
  [   21.217257]  __find_get_block+0x2e8/0x398
  [   21.223160]  __getblk_gfp+0x3c/0x2a8
  [   21.228644]  ext4_getblk+0xcc/0x1b0
  [   21.233991]  ext4_bread_batch+0x78/0x1c8
  [   21.239726]  ext4_find_entry+0x2d4/0x598
  [   21.245416]  ext4_lookup+0xac/0x278
  [   21.250612]  lookup_slow+0xac/0x190
  [   21.255736]  walk_component+0x228/0x340
  [   21.261151]  link_path_walk+0x2f4/0x568
  [   21.266499]  path_parentat+0x44/0x88
  [   21.271521]  filename_parentat+0xa0/0x170
  [   21.276924]  filename_create+0x60/0x168
  [   21.282082]  SyS_symlinkat+0x80/0x128
  [   21.287013]  el0_svc_naked+0x30/0x34
  [   21.291835] Code: 17ffffe7 a90363b7 a9046bb9 f9002bbb (d4210000)
  [   21.299191] ---[ end trace b07cecc329f07f48 ]---
  [   21.347488] systemd: 35 output lines suppressed due to ratelimiting
  [   21.355094] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
  [   21.355094]
  [   21.366666] SMP: stopping secondary CPUs
  [   21.371817] Kernel Offset: disabled
  [   21.376517] CPU features: 0x00901108
  [   21.381310] Memory Limit: none
  [   21.385617] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
  [   21.385617]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to