This change was made by a bot.
** Changed in: linux (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1906716
Title:
Stack trace booting 20.04 LTS server on system with dual Xeon Gold
6240 CPUs
Status in linux package in Ubuntu:
Confirmed
Bug description:
I noticed this in syslog while investigating an unrelated issue today.
I have Focal installed on a Fujitsu RX2530 M5 server with two Xeon
Gold 6240 18c/36t CPUs installed. Every reboot results in the
following MSR stack trace:
Dec 3 17:34:31 nabbit kernel: [ 0.002463] smpboot: CPU 18 Converting
physical 0 to logical die 1
Dec 3 17:34:31 nabbit kernel: [ 0.002463] unchecked MSR access error:
WRMSR to 0x10f (tried to write 0x0000000000000000) at rIP: 0xffffffff81c78b04
(native_write_msr+0x4/0x30)
Dec 3 17:34:31 nabbit kernel: [ 0.002463] Call Trace:
Dec 3 17:34:31 nabbit kernel: [ 0.002463] ?
intel_pmu_cpu_starting+0x87/0x270
Dec 3 17:34:31 nabbit kernel: [ 0.002463] ? x86_pmu_dead_cpu+0x30/0x30
Dec 3 17:34:31 nabbit kernel: [ 0.002463] x86_pmu_starting_cpu+0x1a/0x30
Dec 3 17:34:31 nabbit kernel: [ 0.002463]
cpuhp_invoke_callback+0x9b/0x580
Dec 3 17:34:31 nabbit kernel: [ 0.002463] notify_cpu_starting+0x66/0x80
Dec 3 17:34:31 nabbit kernel: [ 0.002463] start_secondary+0xaa/0x1c0
Dec 3 17:34:31 nabbit kernel: [ 0.002463] secondary_startup_64+0xa4/0xb0
Dec 3 17:34:31 nabbit kernel: [ 0.498575] #19 #20 #21 #22 #23 #24 #25
#26 #27 #28 #29 #30 #31 #32 #33 #34 #35
Dec 3 17:34:31 nabbit kernel: [ 0.618576] .... node #0, CPUs: #36
Dec 3 17:34:31 nabbit kernel: [ 0.623308] MDS CPU bug present and SMT on,
data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more
details.
Dec 3 17:34:31 nabbit kernel: [ 0.623308] TAA CPU bug present and SMT on,
data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html
for more details.
Dec 3 17:34:31 nabbit kernel: [ 0.623308] #37 #38 #39 #40 #41 #42 #43
#44 #45 #46 #47 #48 #49 #50 #51 #52 #53
Dec 3 17:34:31 nabbit kernel: [ 0.672450] .... node #1, CPUs: #54 #55
#56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71Dec 3 17:34:31
nabbit kernel: [ 0.729432] smp: Brought up 2 nodes, 72 CPUs
Dec 3 17:34:31 nabbit kernel: [ 0.729432] smpboot: Max logical packages: 2
Dec 3 17:34:31 nabbit kernel: [ 0.729432] smpboot: Total of 72 processors
activated (374479.29 BogoMIPS)
it doesn't seem to be catastrophic, but is troubling to find this in the logs.
On a different FJ server (RX2540 M5) with 2x Xeon Gold 6242 cpus
(16c/32T)
This trace is not present, so this could indicate something with this
particular machine, or this particular CPU model.
Here is the smp boot from the non-failing machine:
Dec 2 16:02:56 polari kernel: [ 1.522346] smpboot: CPU0: Intel(R) Xeon(R)
Gold 6242 CPU @ 2.80GHz (family: 0x6, model: 0x55, stepping: 0x5)
Dec 2 16:02:56 polari kernel: [ 1.522575] Performance Events: PEBS fmt3+,
Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
Dec 2 16:02:56 polari kernel: [ 1.522584] ... version: 4
Dec 2 16:02:56 polari kernel: [ 1.522585] ... bit width: 48
Dec 2 16:02:56 polari kernel: [ 1.522587] ... generic registers: 4
Dec 2 16:02:56 polari kernel: [ 1.522588] ... value mask:
0000ffffffffffff
Dec 2 16:02:56 polari kernel: [ 1.522589] ... max period:
00007fffffffffff
Dec 2 16:02:56 polari kernel: [ 1.522591] ... fixed-purpose events: 3
Dec 2 16:02:56 polari kernel: [ 1.522592] ... event mask:
000000070000000f
Dec 2 16:02:56 polari kernel: [ 1.522665] rcu: Hierarchical SRCU
implementation.
Dec 2 16:02:56 polari kernel: [ 1.524965] NMI watchdog: Enabled.
Permanently consumes one hw-PMU counter.
Dec 2 16:02:56 polari kernel: [ 1.525875] smp: Bringing up secondary CPUs
...
Dec 2 16:02:56 polari kernel: [ 1.525990] x86: Booting SMP configuration:
Dec 2 16:02:56 polari kernel: [ 1.525992] .... node #0, CPUs: #1
#2 #3
Dec 2 16:02:56 polari kernel: [ 1.533485] .... node #1, CPUs: #4 #5
#6 #7
Dec 2 16:02:56 polari kernel: [ 1.543960] .... node #0, CPUs: #8 #9
#10 #11
Dec 2 16:02:56 polari kernel: [ 1.553544] .... node #1, CPUs: #12 #13
#14 #15
Dec 2 16:02:56 polari kernel: [ 1.564701] .... node #2, CPUs: #16
Dec 2 16:02:56 polari kernel: [ 0.002176] smpboot: CPU 16 Converting
physical 0 to logical die 1
Dec 2 16:02:56 polari kernel: [ 1.651254] #17 #18 #19
Dec 2 16:02:56 polari kernel: [ 1.659278] .... node #3, CPUs: #20 #21
#22 #23
Dec 2 16:02:56 polari kernel: [ 1.669669] .... node #2, CPUs: #24 #25
#26 #27
Dec 2 16:02:56 polari kernel: [ 1.680637] .... node #3, CPUs: #28 #29
#30 #31
Dec 2 16:02:56 polari kernel: [ 1.691394] .... node #0, CPUs: #32
Dec 2 16:02:56 polari kernel: [ 1.693845] MDS CPU bug present and SMT on,
data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more
details.
Dec 2 16:02:56 polari kernel: [ 1.693845] TAA CPU bug present and SMT on,
data leak possible. See
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html
for more details.
Dec 2 16:02:56 polari kernel: [ 1.693845] #33 #34 #35
Dec 2 16:02:56 polari kernel: [ 1.701504] .... node #1, CPUs: #36 #37
#38 #39
Dec 2 16:02:56 polari kernel: [ 1.712687] .... node #0, CPUs: #40 #41
#42 #43
Dec 2 16:02:56 polari kernel: [ 1.723263] .... node #1, CPUs: #44 #45
#46 #47
Dec 2 16:02:56 polari kernel: [ 1.733658] .... node #2, CPUs: #48 #49
#50 #51
Dec 2 16:02:56 polari kernel: [ 1.744372] .... node #3, CPUs: #52 #53
#54 #55
Dec 2 16:02:56 polari kernel: [ 1.755243] .... node #2, CPUs: #56 #57
#58 #59
Dec 2 16:02:56 polari kernel: [ 1.765640] .... node #3, CPUs: #60 #61
#62 #63
Dec 2 16:02:56 polari kernel: [ 1.776965] smp: Brought up 4 nodes, 64 CPUs
Dec 2 16:02:56 polari kernel: [ 1.776965] smpboot: Max logical packages: 2
Dec 2 16:02:56 polari kernel: [ 1.776965] smpboot: Total of 64 processors
activated (358464.56 BogoMIPS)
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-56-generic 5.4.0-56.62
ProcVersionSignature: Ubuntu 5.4.0-56.62-generic 5.4.73
Uname: Linux 5.4.0-56-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Dec 3 20:02 seq
crw-rw---- 1 root audio 116, 33 Dec 3 20:02 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.10
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
Date: Thu Dec 3 20:15:56 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 0424:2533 Microchip Technology, Inc. (formerly SMSC)
Bus 001 Device 004: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard
and Mouse
Bus 001 Device 003: ID 046b:ff01 American Megatrends, Inc. Virtual Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: FUJITSU PRIMERGY RX2530 M5
PciMultimedia:
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-56-generic
root=UUID=0e82de6f-eac2-426d-b89e-e52b1acaa792 ro console=tty0
RelatedPackageVersions:
linux-restricted-modules-5.4.0-56-generic N/A
linux-backports-modules-5.4.0-56-generic N/A
linux-firmware 1.187.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/17/2019
dmi.bios.vendor: FUJITSU // American Megatrends Inc.
dmi.bios.version: V5.0.0.14 R1.15.0 for D3383-B1x
dmi.board.name: D3383-B1
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D3383-B13 WGS04 GS01
dmi.chassis.asset.tag: nabbit
dmi.chassis.type: 23
dmi.chassis.vendor: FUJITSU
dmi.chassis.version: RX2530M5R3
dmi.modalias:
dmi:bvnFUJITSU//AmericanMegatrendsInc.:bvrV5.0.0.14R1.15.0forD3383-B1x:bd10/17/2019:svnFUJITSU:pnPRIMERGYRX2530M5:pvr:rvnFUJITSU:rnD3383-B1:rvrS26361-D3383-B13WGS04GS01:cvnFUJITSU:ct23:cvrRX2530M5R3:
dmi.product.family: SERVER
dmi.product.name: PRIMERGY RX2530 M5
dmi.product.sku: S26361-K1659-Vxxx
dmi.sys.vendor: FUJITSU
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1906716/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp