On Thu, 16 Jan 2020 02:14:16 -0000 dann frazier <dann.fraz...@canonical.com> wrote:
> I built a kernel with the proposed patches[*] and ran a reboot/kernel > compile test on 4 systems. The tests survived 46 total iterations > (~12/system) before I interrupted. Two systems failed with "Synchronous > External Abort: synchronous parity or ECC error" errors. > > I've reverted the systems back to 4.15.0-70 - the kernel before the > cpufeature/errata patches that caused this - to see if these SEA errors > are a regression. > > [*] https://lists.ubuntu.com/archives/kernel- > team/2020-January/106909.html > I've ran 75 iterations of reboot/compile-kernel and encountered 3 gcc segmentation faults. Unfortunately, my test didn't capture the dmesg log but it's likely that these are due to the ECC problems we're (still?) seeing. There was also another issue during one of the reboots which is probably unrelated and due to a flaky BMC: [ 33.896320] ipmi_ssif 0-0012: IPMI message handler: device id demangle failed: -22 [ 33.896354] ipmi_ssif 0-0012: Unable to get the device id: -5 [ 33.987825] ipmi_ssif 0-0012: Found new BMC (man_id: 0x000000, prod_id: 0xaabb, dev_id: 0x20) [ 33.987858] Unable to handle kernel read from unreadable memory at virtual address 00000018 [ 33.999300] Mem abort info: [ 34.005475] ESR = 0x96000004 [ 34.011454] Exception class = DABT (current EL), IL = 32 bits [ 34.020168] SET = 0, FnV = 0 [ 34.025893] EA = 0, S1PTW = 0 [ 34.031617] Data abort info: [ 34.037060] ISV = 0, ISS = 0x00000004 [ 34.043448] CM = 0, WnR = 0 [ 34.048949] user pgtable: 4k pages, 48-bit VAs, pgd = 000000002799ee91 [ 34.058063] [0000000000000018] *pgd=0000000000000000 [ 34.065624] Internal error: Oops: 96000004 [#1] SMP [ 34.073090] Modules linked in: nls_iso8859_1 sch_fq_codel thunderx_zip thunderx_edac ib_iser cavium_rng_vf rdma_cm ipmi_ssif(+) ipmi_devintf shpchp cavium_rng iw_cm ipmi_msghandler ib_cm gpio_keys uio_pdrv_genirq uio ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear ast i2c_algo_bit drm_kms_helper nicvf syscopyarea sysfillrect sysimgblt fb_sys_fops ttm nicpf drm aes_ce_blk aes_ce_cipher crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci thunder_bgx libahci i2c_thunderx thunder_xcv mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 34.161807] Process kworker/64:1 (pid: 651, stack limit = 0x00000000b0697881) [ 34.172016] CPU: 64 PID: 651 Comm: kworker/64:1 Not tainted 4.15.18+ #40 [ 34.181723] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 34.193113] Workqueue: events redo_bmc_reg [ipmi_msghandler] [ 34.201840] pstate: 80400005 (Nzcv daif +PAN -UAO) [ 34.209589] pc : smi_send.isra.4+0x80/0x158 [ipmi_msghandler] [ 34.218275] lr : smi_send.isra.4+0x150/0x158 [ipmi_msghandler] [ 34.227046] sp : ffff0000128c3b10 [ 34.233209] x29: ffff0000128c3b10 x28: 0000000000000020 [ 34.241305] x27: 0000000000000002 x26: 0000000000000000 [ 34.249437] x25: ffff0000128c3c40 x24: ffff0000128c3c38 [ 34.257455] x23: 0000000000000000 x22: 0000000000000018 [ 34.265500] x21: 0000000000000000 x20: ffff810fb16c8800 [ 34.273558] x19: ffff800faffb0000 x18: ffffffffffffffff [ 34.281643] x17: 0000000000000005 x16: 0000000000000000 [ 34.289758] x15: ffff000009578c08 x14: ffff810fb0d20187 [ 34.297899] x13: ffff810fb0d20186 x12: 0000000000000030 [ 34.305997] x11: 0101010101010101 x10: ffff7f7f7f7f7f7f [ 34.314069] x9 : fefdfefefefefeff x8 : ffff810fb16c8800 [ 34.322166] x7 : 0000000000001138 x6 : 000000000000125c [ 34.330300] x5 : 00000000000000dc x4 : ffff810fbc8f1340 [ 34.338456] x3 : 0000000000000000 x2 : 0000000000000000 [ 34.346633] x1 : ffff810fb16c8800 x0 : ffff810fae4ff800 [ 34.354839] Call trace: [ 34.360207] smi_send.isra.4+0x80/0x158 [ipmi_msghandler] [ 34.368450] i_ipmi_request+0x2ac/0x980 [ipmi_msghandler] [ 34.376716] send_channel_info_cmd+0xac/0xd8 [ipmi_msghandler] [ 34.385396] __scan_channels.isra.20+0x84/0x180 [ipmi_msghandler] [ 34.394341] __bmc_get_device_id+0x424/0x8c8 [ipmi_msghandler] [ 34.402994] redo_bmc_reg+0x6c/0x70 [ipmi_msghandler] [ 34.410840] process_one_work+0x1e0/0x420 [ 34.417640] worker_thread+0x4c/0x478 [ 34.420416] IPv6: ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready [ 34.424073] kthread+0x134/0x138 [ 34.424081] ret_from_fork+0x10/0x18 [ 34.424089] Code: f908aa74 b4ffff74 f9424e60 aa1403e1 (f94002c2) [ 34.454826] ---[ end trace b54ad269f357375f ]--- [ 34.467956] ipmi_ssif: Unable to register device: error -5 [ 34.476380] ipmi_ssif 0-0012: Unable to start IPMI SSIF: -5 [ 34.484925] ipmi_ssif: probe of 0-0012 failed with error -5 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857074 Title: Cavium ThunderX CN88XX Panic : Unknown reason Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: Series: Bionic Kernel: 4.15.0-74.84 linux-generic Steps to reproduce: Install 4.15.0-74.84 Kernel and boot the system. The following crash was observed while testing the proposed kernel for the 2019.12.02 SRU Cycle. This kernel was built to include fixes for the following bugs: * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX (LP: #1853326) - Revert "arm64: Use firmware to detect CPUs that are not affected by Spectre-v2" - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*" * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and Kunpeng920 (LP: #1852723) - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to correct place The following crash appears to be a NEW bug. not related to the prior bugs listed above. This bug DOES NOT APPEAR to be related to LP#1857073. This is another NEW BUG. Hostname: Starmie Probable Cause is unknown at this point and still under investigation. [ OK ] Found device WDC_WD5003ABYZ-011FA0 efi. Mounting /boot/efi... [ OK ] Mounted /boot/efi. [ OK ] Reached target Local File Systems. Starting AppArmor initialization... Starting Tell Plymouth To Write Out Runtime Data... Starting ebtables ruleset management... [ 20.942427] kernel BUG at /build/linux-pWET3k/linux-4.15.0/fs/buffer.c:1240! [ 20.951416] Internal error: Oops - BUG: 0 [#1] SMP [ 20.958153] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip cavium_rng_vf shpchp cavium_rng gpio_keys uio_pdrv_genirq ipmi_ssif uio ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk sysimgblt fb_sys_fops aes_ce_cipher crc32_ce drm crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci thunder_bgx libahci thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 21.044326] Process systemd (pid: 1, stack limit = 0x000000005af6f18b) [ 21.053858] CPU: 1 PID: 1 Comm: systemd Not tainted 4.15.0-74-generic #84-Ubuntu [ 21.063931] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 21.074790] pstate: 20400085 (nzCv daIf +PAN -UAO) [ 21.082096] pc : __find_get_block+0x2e8/0x398 [ 21.088917] lr : __getblk_gfp+0x3c/0x2a8 [ 21.095379] sp : ffff0000099ab7e0 [ 21.101062] x29: ffff0000099ab7e0 x28: 0000000000000000 [ 21.108699] x27: 0000000000000000 x26: 0000000000000000 [ 21.116265] x25: 0000000000000001 x24: 0000000000000000 [ 21.123788] x23: 0000000000000008 x22: ffff801f26116c80 [ 21.131302] x21: ffff801f26116c80 x20: 000000000000245c [ 21.138808] x19: 0000000000001000 x18: 0000ffffa59c3a70 [ 21.146300] x17: 0000000000000000 x16: 0000000000000000 [ 21.153730] x15: 0000000000000020 x14: 0000000000000012 [ 21.161083] x13: 2f7374696e752f64 x12: 0101010101010101 [ 21.168397] x11: 7f7f7f7f7f7f7f7f x10: ffff00000972d000 [ 21.175689] x9 : 0000000000000000 x8 : ffff801f7ba7e3c0 [ 21.183042] x7 : ffff801f7ba7e3e0 x6 : 0000000000000000 [ 21.190667] x5 : 0000000000000004 x4 : 0000000000000020 [ 21.197955] x3 : 0000000000000008 x2 : 0000000000001000 [ 21.205680] x1 : 000000000000245c x0 : 0000000000000080 [ 21.212918] Call trace: [ 21.217257] __find_get_block+0x2e8/0x398 [ 21.223160] __getblk_gfp+0x3c/0x2a8 [ 21.228644] ext4_getblk+0xcc/0x1b0 [ 21.233991] ext4_bread_batch+0x78/0x1c8 [ 21.239726] ext4_find_entry+0x2d4/0x598 [ 21.245416] ext4_lookup+0xac/0x278 [ 21.250612] lookup_slow+0xac/0x190 [ 21.255736] walk_component+0x228/0x340 [ 21.261151] link_path_walk+0x2f4/0x568 [ 21.266499] path_parentat+0x44/0x88 [ 21.271521] filename_parentat+0xa0/0x170 [ 21.276924] filename_create+0x60/0x168 [ 21.282082] SyS_symlinkat+0x80/0x128 [ 21.287013] el0_svc_naked+0x30/0x34 [ 21.291835] Code: 17ffffe7 a90363b7 a9046bb9 f9002bbb (d4210000) [ 21.299191] ---[ end trace b07cecc329f07f48 ]--- [ 21.347488] systemd: 35 output lines suppressed due to ratelimiting [ 21.355094] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 21.355094] [ 21.366666] SMP: stopping secondary CPUs [ 21.371817] Kernel Offset: disabled [ 21.376517] CPU features: 0x00901108 [ 21.381310] Memory Limit: none [ 21.385617] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 21.385617] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp