Thanks for testing! I will SRU this to the Disco kernel. https://lists.ubuntu.com/archives/kernel-team/2019-December/106569.html
** Description changed: - Using Linux kernel, When inject 1bit ecc error, there are some mce log - recorded in the dmesg.like: + == SRU Justification == + With the 5.0 Disco kernel, the kernel cannot record the mce log while + injecting 1bit ecc error. + + == Fix == + * 09cbd219 (RAS/CEC: Increment cec_entered under the mutex lock) + * de0e0624 (RAS/CEC: Check count_threshold unconditionally) + + Commit de0e0624 is the real fix for this issue, 09cbd219 is a fix to + avoid race condition, and it can make the latter become a clean + cherry-pick. + + These have been landed on newer kernels. + + == Test == + Test kernel could be found here: + https://people.canonical.com/~phlin/kernel/lp-1857413-ras-err-msg/ + + Verified by the bug reporter, fan jinke, the patched kernel can log + the error correctly. + + == Regression Potential == + Low, changes are limited to the RAS Correctable Errors Collector. And + the fix has been verified as working as expected. + + + == Original Bug Report == + Using Linux kernel, When inject 1bit ecc error, there are some mce log recorded in the dmesg.like: [ 1561.511210] mce: [Hardware Error]: Machine check events logged [ 1561.511221] [Hardware Error]: Corrected error, no action required. [ 1561.511311] [Hardware Error]: CPU:0 (18:0:2) MC16_STATUS[Over|CE|MiscV|-|AddrV|-|-|SyndV|-|CECC]: 0xdc2040000000011b [ 1561.511388] [Hardware Error]: Error Addr: 0x000000077cd66940 [ 1561.511439] [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000010ce0a400d01 [ 1561.511499] [Hardware Error]: Unified Memory Controller Extended Error Code: 0 [ 1561.511556] [Hardware Error]: Unified Memory Controller Error: DRAM ECC error. [ 1561.511646] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7fcd66 offset:0x940 grain:0 syndrome:0x10ce) [ 1561.511648] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD *But, there are no the log when Using "Ubuntu 18.04.3 LTS"* The upstream related commit is de0e0624d86ff9fc512dedb297f8978698abf21a . After merged this commit, Ubuntu kernel's dmesg can record the mce log as well. - --- + --- ProblemType: Bug AlsaDevices: - total 0 - crw-rw----+ 1 root audio 116, 1 Dec 24 17:20 seq - crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer + total 0 + crw-rw----+ 1 root audio 116, 1 Dec 24 17:20 seq + crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.10-0ubuntu27 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 19.04 InstallationDate: Installed on 2019-12-24 (0 days ago) InstallationMedia: Ubuntu-Server 19.04 "Disco Dingo" - Release amd64 (20190416.1) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: Sugon HygonH210 Package: linux (not installed) PciMultimedia: - + ProcEnviron: - TERM=linux - PATH=(custom, no user) - LANG=en_US.UTF-8 - SHELL=/bin/bash + TERM=linux + PATH=(custom, no user) + LANG=en_US.UTF-8 + SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-13-generic root=UUID=43f8bc11-d850-4e79-9d14-1232ef50040f ro ProcVersionSignature: Ubuntu 5.0.0-13.14-generic 5.0.6 RelatedPackageVersions: - linux-restricted-modules-5.0.0-13-generic N/A - linux-backports-modules-5.0.0-13-generic N/A - linux-firmware 1.178 + linux-restricted-modules-5.0.0-13-generic N/A + linux-backports-modules-5.0.0-13-generic N/A + linux-firmware 1.178 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: disco Uname: Linux 5.0.0-13-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: - + _MarkForUpload: True dmi.bios.date: 03/15/2019 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 210ER119 dmi.board.asset.tag: Default string dmi.board.name: HygonH210 dmi.board.vendor: Sugon dmi.board.version: Default string dmi.chassis.asset.tag: Default string dmi.chassis.type: 17 dmi.chassis.vendor: Sugon dmi.chassis.version: Default string dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr210ER119:bd03/15/2019:svnSugon:pnHygonH210:pvrDefaultstring:rvnSugon:rnHygonH210:rvrDefaultstring:cvnSugon:ct17:cvrDefaultstring: dmi.product.family: Rack dmi.product.name: HygonH210 dmi.product.sku: Default string dmi.product.version: Default string dmi.sys.vendor: Sugon -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1857413 Title: mce: ras: When inject 1bit ecc error, there is no mce log recorded in the dmesg To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857413/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs