Found this commit in mainline and our 6.5 HWE kernel. Checking now to see if there are any prerequisites as well.
commit 5d515ee40cb57ea5331998f27df7946a69f14dc3 Author: Kan Liang <kan.li...@linux.intel.com> Date: Thu Jan 12 12:01:05 2023 -0800 perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2049637 Title: Some SPR systems throw kernel warnings from uncore_discovery.c Status in intel: New Status in linux package in Ubuntu: New Bug description: 5.15 Kernel Warnings with some Sapphire Rapids CPUs On some Sapphire Rapids CPUs we are seeing Kernel warnings in the syslog: https://certification.canonical.com/hardware/202311-32288/submission/341156/ Intel(R) Xeon(R) Gold 6442Y Oct 31 03:35:55 N8 kernel: [ 92.770372] ------------[ cut here ]------------ Oct 31 03:35:55 N8 kernel: [ 92.825738] WARNING: CPU: 48 PID: 1 at arch/x86/events/intel/uncore_discovery.c:184 uncore_insert_box_info+0x134/0x350 Oct 31 03:35:55 N8 kernel: [ 92.953850] Modules linked in: Oct 31 03:35:55 N8 kernel: [ 92.990464] CPU: 48 PID: 1 Comm: swapper/0 Not tainted 5.15.0-88-generic #98-Ubuntu Oct 31 03:35:55 N8 kernel: [ 93.082179] Hardware name: ASUSTeK COMPUTER INC. ESC N8-E11/Z13PN-D32 Series, BIOS 0402 09/08/2023 Oct 31 03:35:55 N8 kernel: [ 93.189501] RIP: 0010:uncore_insert_box_info+0x134/0x350 Oct 31 03:35:55 N8 kernel: [ 93.206419] Freeing initrd memory: 106936K Oct 31 03:35:55 N8 kernel: [ 93.253138] Code: c2 01 48 83 c0 04 39 d1 0f 8e c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 06 41 89 34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 b2 07 33 00 4c 89 f7 e8 aa 07 33 00 5b 41 5c 41 Oct 31 03:35:55 N8 kernel: [ 93.527071] RSP: 0000:ff5c25ed800efc98 EFLAGS: 00010246 Oct 31 03:35:55 N8 kernel: [ 93.589669] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003 Oct 31 03:35:55 N8 kernel: [ 93.675160] RDX: 0000000000000002 RSI: 0000000000018000 RDI: 0000000000000003 Oct 31 03:35:55 N8 kernel: [ 93.760654] RBP: ff5c25ed800efcc0 R08: 0000000000000010 R09: ff32ac8a801df260 Oct 31 03:35:55 N8 kernel: [ 93.846130] R10: 0000000000000246 R11: 00000000ffffffff R12: ff32ac8a8b8412a0 Oct 31 03:35:55 N8 kernel: [ 93.931613] R13: ff5c25ed800efcf8 R14: ff32ac8a8aa32cb0 R15: ff32ac8a801df260 Oct 31 03:35:55 N8 kernel: [ 94.017099] FS: 0000000000000000(0000) GS:ff32ac99bfa00000(0000) knlGS:0000000000000000 Oct 31 03:35:55 N8 kernel: [ 94.114042] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 31 03:35:55 N8 kernel: [ 94.182871] CR2: 0000000000000000 CR3: 0000000d07e10001 CR4: 0000000000771ee0 Oct 31 03:35:55 N8 kernel: [ 94.268360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 31 03:35:55 N8 kernel: [ 94.353828] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Oct 31 03:35:55 N8 kernel: [ 94.439332] PKRU: 55555554 Oct 31 03:35:55 N8 kernel: [ 94.471788] Call Trace: Oct 31 03:35:55 N8 kernel: [ 94.501100] <TASK> Oct 31 03:35:55 N8 kernel: [ 94.526275] ? show_trace_log_lvl+0x1d6/0x2ea Oct 31 03:35:55 N8 kernel: [ 94.578457] ? show_trace_log_lvl+0x1d6/0x2ea Oct 31 03:35:55 N8 kernel: [ 94.630686] ? parse_discovery_table.isra.0+0x162/0x1a0 Oct 31 03:35:55 N8 kernel: [ 94.693295] ? show_regs.part.0+0x23/0x29 Oct 31 03:35:55 N8 kernel: [ 94.741331] ? show_regs.cold+0x8/0xd Oct 31 03:35:55 N8 kernel: [ 94.785212] ? uncore_insert_box_info+0x134/0x350 Oct 31 03:35:55 N8 kernel: [ 94.841591] ? __warn+0x8c/0x100 Oct 31 03:35:55 N8 kernel: [ 94.880281] ? uncore_insert_box_info+0x134/0x350 Oct 31 03:35:55 N8 kernel: [ 94.936636] ? report_bug+0xa4/0xd0 Oct 31 03:35:55 N8 kernel: [ 94.978460] ? handle_bug+0x39/0x90 Oct 31 03:35:55 N8 kernel: [ 95.020246] ? exc_invalid_op+0x19/0x70 Oct 31 03:35:55 N8 kernel: [ 95.066232] ? asm_exc_invalid_op+0x1b/0x20 Oct 31 03:35:55 N8 kernel: [ 95.116341] ? uncore_insert_box_info+0x134/0x350 Oct 31 03:35:55 N8 kernel: [ 95.172708] ? uncore_insert_box_info+0xe3/0x350 Oct 31 03:35:55 N8 kernel: [ 95.228032] parse_discovery_table.isra.0+0x162/0x1a0 Oct 31 03:35:55 N8 cloud-init[1992]: |.+.o .o .o o +| Oct 31 03:35:55 N8 kernel: [ 95.288570] intel_uncore_has_discovery_tables+0x19e/0x270 Oct 31 03:35:55 N8 kernel: [ 95.354298] ? type_pmu_register+0x2f/0x42 Oct 31 03:35:55 N8 kernel: [ 95.403385] intel_uncore_init+0xe3/0x226 Oct 31 03:35:55 N8 kernel: [ 95.451409] ? type_pmu_register+0x42/0x42 Oct 31 03:35:55 N8 kernel: [ 95.500506] do_one_initcall+0x46/0x1e0 Oct 31 03:35:55 N8 kernel: [ 95.546475] do_initcalls+0x12f/0x159 Oct 31 03:35:55 N8 kernel: [ 95.590372] kernel_init_freeable+0x162/0x1b5 Oct 31 03:35:55 N8 kernel: [ 95.642556] ? rest_init+0x100/0x100 Oct 31 03:35:55 N8 kernel: [ 95.685405] kernel_init+0x1b/0x150 Oct 31 03:35:55 N8 kernel: [ 95.727228] ? rest_init+0x100/0x100 Oct 31 03:35:55 N8 kernel: [ 95.770054] ret_from_fork+0x1f/0x30 Oct 31 03:35:55 N8 kernel: [ 95.812906] </TASK> Oct 31 03:35:55 N8 kernel: [ 95.839108] ---[ end trace 2d0c57130f45fd62 ]--- https://certification.canonical.com/hardware/202305-31570/submission/312593/ Intel(R) Xeon(R) Gold 6426Y Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135184] ------------[ cut here ]------------ Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135185] WARNING: CPU: 0 PID: 1 at arch/x86/events/intel/uncore_discovery.c:184 uncore_insert_box_info+0x134/0x350 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135192] Modules linked in: Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135194] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-69-generic #76-Ubuntu Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135198] Hardware name: HPE ProLiant ML110 Gen11/ProLiant ML110 Gen11, BIOS 1.30 03/01/2023 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135200] RIP: 0010:uncore_insert_box_info+0x134/0x350 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135202] Code: c2 01 48 83 c0 04 39 d1 0f 8e c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 06 41 89 34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 22 a2 32 00 4c 89 f7 e8 1a a2 32 00 5b 41 5c 41 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135206] RSP: 0000:ff3b3e198006bc98 EFLAGS: 00010246 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135209] RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000003 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135210] RDX: 0000000000000002 RSI: 0000000000018000 RDI: 0000000000000003 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135212] RBP: ff3b3e198006bcc0 R08: 0000000000000010 R09: ff31766844f3c5e0 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135214] R10: ff31766844fa4438 R11: 0000000000000000 R12: ff31766844f5fa20 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135216] R13: ff3b3e198006bcf8 R14: ff31766844f3ca20 R15: ff31766844f3c5e0 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135218] FS: 0000000000000000(0000) GS:ff3176e5bf800000(0000) knlGS:0000000000000000 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135222] CR2: 0000000000000000 CR3: 0000004f35e10001 CR4: 0000000000771ef0 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135224] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135225] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135227] PKRU: 55555554 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135228] Call Trace: Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135230] <TASK> Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135232] parse_discovery_table.isra.0+0x162/0x1a0 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135235] intel_uncore_has_discovery_tables+0x19e/0x270 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135238] ? type_pmu_register+0x21/0x42 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135243] intel_uncore_init+0xe3/0x226 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135246] ? type_pmu_register+0x42/0x42 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135249] do_one_initcall+0x46/0x1e0 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135253] do_initcalls+0x12f/0x159 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135256] kernel_init_freeable+0x162/0x1b5 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135259] ? rest_init+0x100/0x100 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135263] kernel_init+0x1b/0x150 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135265] ? rest_init+0x100/0x100 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135266] ret_from_fork+0x1f/0x30 Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135270] </TASK> Apr 14 17:29:28 ML110Gen11 kernel: [ 2.135271] ---[ end trace 6011f2a9999291c3 ]--- This doesn't happen on ALL SPR platforms, but it does happen periodically, and always seems to be centered around arch/x86/events/intel/uncore_discovery.c This doesn't seem to cause an stability issues that we've seen, but we need to know if these are innocuous, and better, can this be fixed so the kernel no longer spits out warnings (which triggers the kernel taint flag)? To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/2049637/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp