Same problem.

Supermicro H13SSL-NT motherboard.
Latest firmware/bios from Supermicro:
Firmware: 01.03.11 (2025-02-18)
BIOS: 3.4 (2025-02-13)

Single socket AMD EPYC 9124 (16 cores)
Microcode: 0xa101154

Boot from released server livecd: ubuntu-25.04-live-server-amd64.iso

Gets past grub, starts booting, hangs before installer launches,
reboots automatically.

Additionally I see lines like this in the BMC web console:
2025-04-20 18:31:19     ProcessorConfiguration  [PC-0153] Configuration error - 
CPU 1 EX Uncorrectable error - Assertion


Boots now with "mce=off pci=noaer" (both, not with either one alone) with 
kernel 6.14.0-15 (from installer live cd).
Earlier ubuntu 25.04 beta livecd's caused hard lockup + BMC error message like 
above, unable to reboot/powercycle system from BMC, required physically yanking 
power cables...

This system had no issues earlier, and worked fine with Ubuntu kernels
6.2.x, 6.5.x, 6.8.x and 6.11.x.

This system also runs fine on Fedora 41, with kernels 6.13.x and 6.15.x,
- but none of the 6.14.x kernels from Fedora 42.

Unable to capture anything useful without mce=off and pci=noaer, web
console and serial console just stops when it hangs without printing
anything before reboot...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2106553

Title:
  Epyc Genoa system unable to boot starting with Kernel 6.14.0

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have an Epyc Genoa system that won't boot with either the 6.14.0 or
  6.14.1 kernels without mce=off. The system is only a few months old,
  and has run without issue on all kernels up to the most recent 6.13.10
  release. Starting with kernel 6.14.0, the system is unable to boot,
  apparently due to a CPU cache issue (based on my reading of the
  journalctl -b logs). I don't believe this is actually a hardware issue
  as reported by the logs, since I have run thorough stress tests on
  both the CPU and memory without any problem since first coming accross
  this. I have reached out to the board manufacturer, and there are no
  newer BIOS updates available. System details are below.

  CPU: AMD EPYC 9554 (microcode 0x0a101148)
  Motherboard: ASRock Rack GENOAD8X-2T/BCM (BIOS Firmware Version       10.05, 
BMC Firmware Version 10.02.00)
  Memory: 8x64GB MICRON DDR5 RDIMM

  [    2.016623] BERT: Error records from previous boot:
  [    2.019601] [Hardware Error]: event severity: fatal
  [    2.022688] [Hardware Error]:  Error 0, type: fatal
  [    2.026173] [Hardware Error]:  fru_text: ProcessorError
  [    2.028867] [Hardware Error]:   section_type: IA32/X64 processor error
  [    2.031706] [Hardware Error]:   Local APIC_ID: 0x0
  [    2.033879] [Hardware Error]:   CPUID Info:
  [    2.036475] [Hardware Error]:   00000000: 00a10f11 00000000 00800800 
00000000
  [    2.038856] [Hardware Error]:   00000010: 76fa320b 00000000 178bfbff 
00000000
  [    2.040891] [Hardware Error]:   00000020: 00000000 00000000 00000000 
00000000
  [    2.043879] [Hardware Error]:   Error Information Structure 0:
  [    2.045883] [Hardware Error]:    Error Structure Type: cache error
  [    2.047903] [Hardware Error]:    Check Information: 0x000000000602001f
  [    2.050184] [Hardware Error]:     Transaction Type: 2, Generic
  [    2.052881] [Hardware Error]:     Operation: 0, generic error
  [    2.054882] [Hardware Error]:     Level: 0
  [    2.057883] [Hardware Error]:     Processor Context Corrupt: true
  [    2.059883] [Hardware Error]:     Uncorrected: true
  [    2.061899] [Hardware Error]:   Context Information Structure 0:
  [    2.063883] [Hardware Error]:    Register Context Type: MSR Registers 
(Machine Check and other MSRs)
  [    2.067872] usb 1-1: new high-speed USB device number 2 using xhci_hcd
  [    2.067891] [Hardware Error]:    Register Array Size: 0x0050
  [    2.073207] [Hardware Error]:    MSR Address: 0xc0002051
  [    2.076587] [Hardware Error]:   Context Information Structure 1:
  [    2.078873] [Hardware Error]:    Register Context Type: Unclassified Data
  [    2.081350] [Hardware Error]:    Register Array Size: 0x0010
  [    2.083291] [Hardware Error]:    Register Array:
  [    2.085194] [Hardware Error]:    00000000: 00000010 00000000 1c3010c0 
fffffffe
  [    2.087887] BERT: Total records found: 1

  ProblemType: Bug
  DistroRelease: Ubuntu 25.04
  Package: linux-image-6.14.0-13-generic 6.14.0-13.13
  ProcVersionSignature: Ubuntu 6.14.0-13.13-generic 6.14.0
  Uname: Linux 6.14.0-13-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k6.14.0-13-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.32.0-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/controlC0', '/dev/snd/controlC2', '/dev/snd/controlC1', 
'/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/hwC2D0', 
'/dev/snd/pcmC2D9p', '/dev/snd/hwC1D0', '/dev/snd/pcmC2D8p', 
'/dev/snd/pcmC1D9p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC2D7p', 
'/dev/snd/pcmC1D8p', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC2D3p', 
'/dev/snd/pcmC1D7p', '/dev/snd/pcmC1D3p', '/dev/snd/seq', '/dev/snd/timer'] 
failed with exit code 1:
  CRDA: N/A
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
  Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
  Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
  Card2.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
  Card2.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
  CasperMD5CheckResult: pass
  CurrentDmesg: Error: command ['dmesg'] failed with exit code 1: dmesg: read 
kernel buffer failed: Operation not permitted
  Date: Tue Apr  8 23:49:59 2025
  InstallationDate: Installed on 2025-04-08 (0 days ago)
  InstallationMedia: Ubuntu-Server 25.04 "Plucky Puffin" - Daily amd64 
(20250324)
  MachineType: To Be Filled By O.E.M. GENOAD8X-2T/BCM
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=<set>
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.14.0-13-generic 
root=/dev/mapper/ubuntu--vg-ubuntu--lv ro 
crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M 
mce=off
  RelatedPackageVersions:
   linux-restricted-modules-6.14.0-13-generic N/A
   linux-backports-modules-6.14.0-13-generic  N/A
   linux-firmware                             20250317.git1d4c88ee-0ubuntu1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  acpidump:

  dmi.bios.date: 12/12/2024
  dmi.bios.release: 5.27
  dmi.bios.vendor: American Megatrends International, LLC.
  dmi.bios.version: 10.05
  dmi.board.name: GENOAD8X-2T/BCM
  dmi.board.vendor: ASRockRack
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 17
  dmi.chassis.vendor: To Be Filled By O.E.M.
  dmi.chassis.version: To Be Filled By O.E.M.
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInternational,LLC.:bvr10.05:bd12/12/2024:br5.27:svnToBeFilledByO.E.M.:pnGENOAD8X-2T/BCM:pvrToBeFilledByO.E.M.:rvnASRockRack:rnGENOAD8X-2T/BCM:rvr:cvnToBeFilledByO.E.M.:ct17:cvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:
  dmi.product.family: To Be Filled By O.E.M.
  dmi.product.name: GENOAD8X-2T/BCM
  dmi.product.sku: To Be Filled By O.E.M.
  dmi.product.version: To Be Filled By O.E.M.
  dmi.sys.vendor: To Be Filled By O.E.M.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106553/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to