Hi Philip, The crash was happening on u-12tb1.112xlarge instances (448 cores, 12TB RAM), and would happen once very few hours on average when the system was busy.
Our systems worked fine for months after the upgrade to 24.04. Then we started experiencing (different) crashes with the now retracted 6.8.0-1011. We went back to 1010 and these crashes appeared. We upgraded Yesterday to 1012 on all the servers and have had zero crashes since, so perhaps 1012 fixes this issue directly or indirectly. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-aws in Ubuntu. https://bugs.launchpad.net/bugs/2076062 Title: 6.8.0-1010-aws repeating UG: kernel NULL pointer dereference, address: 00000000000000a0 Status in linux-signed-aws package in Ubuntu: New Bug description: We have a repeating (every few hours) kernel OOPS: BUG: kernel NULL pointer dereference, address: 00000000000000a0. This repeats on several different hardwares so this is not a hardware issue. Attached as a file are 5 dmesg logs and kdump reports. [ 6315.768576] [ T52862] #PF: supervisor read access in kernel mode [ 6315.770893] [ T52862] #PF: error_code(0x0000) - not-present page [ 6315.773266] [ T52862] PGD 0 P4D 0 [ 6315.774528] [ T52862] Oops: 0000 [#1] SMP NOPTI [ 6315.776262] [ T52862] CPU: 399 PID: 52862 Comm: python Kdump: loaded Tainted: P O 6.8.0-1010-aws #10-Ubuntu [ 6315.780647] [ T52862] Hardware name: Amazon EC2 u-12tb1.112xlarge/, BIOS 1.0 10/16/2017 [ 6315.783698] [ T52862] RIP: 0010:pick_next_task_fair+0x91/0x620 [ 6315.785810] [ T52862] Code: 91 00 00 00 49 81 bd b0 02 00 00 00 3b 6f 91 75 60 4d 89 fe eb 27 4c 89 f7 e8 5b b3 ff ff 84 c0 75 3f 4c 89 f7 e8 8f fb fe ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 8> [ 6315.793288] [ T52862] RSP: 0018:ffffbcf57feafb20 EFLAGS: 00010046 [ 6315.795539] [ T52862] RAX: 0000000000000000 RBX: ffffbcf57feafbf0 RCX: 0000000000000000 [ 6315.798481] [ T52862] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 6315.801402] [ T52862] RBP: ffffbcf57feafb60 R08: 0000000000000000 R09: 0000000000000000 [ 6315.804358] [ T52862] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa64bb27b47c0 [ 6315.807298] [ T52862] R13: ffff9ed3e4a88000 R14: ffffa64bb27b48c0 R15: ffffa64bb27b48c0 [ 6315.810210] [ T52862] FS: 00007635af6006c0(0000) GS:ffffa64bb2780000(0000) knlGS:0000000000000000 [ 6315.813512] [ T52862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6315.817797] [ T52862] CR2: 00000000000000a0 CR3: 000003029b5dc002 CR4: 00000000007706f0 [ 6315.822588] [ T52862] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6315.827353] [ T52862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6315.832121] [ T52862] PKRU: 55555554 [ 6315.835228] [ T52862] Call Trace: [ 6315.838210] [ T52862] <TASK> [ 6315.841100] [ T52862] ? show_regs+0x6d/0x80 [ 6315.844493] [ T52862] ? __die+0x24/0x80 [ 6315.847706] [ T52862] ? page_fault_oops+0x99/0x1b0 [ 6315.851278] [ T52862] ? do_user_addr_fault+0x2d8/0x6a0 [ 6315.855048] [ T52862] ? exc_page_fault+0x83/0x190 [ 6315.858600] [ T52862] ? asm_exc_page_fault+0x27/0x30 [ 6315.862258] [ T52862] ? pick_next_task_fair+0x91/0x620 [ 6315.865961] [ T52862] pick_next_task+0x5f/0xce0 [ 6315.869496] [ T52862] __schedule+0x16e/0x790 [ 6315.872845] [ T52862] ? os_xsave+0x2e/0x70 [ 6315.876108] [ T52862] schedule+0x2c/0xf0 [ 6315.879234] [ T52862] futex_wait_queue+0x65/0xa0 [ 6315.882626] [ T52862] __futex_wait+0x155/0x1d0 [ 6315.885917] [ T52862] ? __pfx_futex_wake_mark+0x10/0x10 [ 6315.889487] [ T52862] futex_wait+0x74/0x120 [ 6315.892690] [ T52862] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 6315.896211] [ T52862] do_futex+0x105/0x260 [ 6315.899355] [ T52862] __x64_sys_futex+0x12a/0x200 [ 6315.902692] [ T52862] ? __x64_sys_futex+0x12a/0x200 [ 6315.906094] [ T52862] x64_sys_call+0x1ce7/0x25c0 [ 6315.909394] [ T52862] do_syscall_64+0x7f/0x180 [ 6315.912631] [ T52862] ? do_syscall_64+0x8c/0x180 [ 6315.915970] [ T52862] ? switch_fpu_return+0x51/0xd0 [ 6315.919359] [ T52862] ? syscall_exit_to_user_mode+0x86/0x260 [ 6315.923069] [ T52862] ? do_syscall_64+0x8c/0x180 [ 6315.926410] [ T52862] entry_SYSCALL_64_after_hwframe+0x78/0x80 [ 6315.930158] [ T52862] RIP: 0033:0x7635b3e98d61 [ 6315.933439] [ T52862] Code: 48 89 4d c8 e8 00 f8 ff ff 4c 8b 55 c8 8b 75 dc 45 31 c0 41 89 c5 48 8b 7d d0 41 b9 ff ff ff ff 44 89 e2 b8 ca 00 00 00 0f 05 <44> 89 ef 48 89 c3 e8 54 f8 ff ff e9 65 ff f> [ 6315.945541] [ T52862] RSP: 002b:00007635af5fde10 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [ 6315.951915] [ T52862] RAX: ffffffffffffffda RBX: 00005a22f1a804f0 RCX: 00007635b3e98d61 [ 6315.956420] [ T52862] RDX: 0000000000000000 RSI: 0000000000000089 RDI: 00005a22f1a80518 [ 6315.960936] [ T52862] RBP: 00007635af5fde50 R08: 0000000000000000 R09: 00000000ffffffff [ 6315.965421] [ T52862] R10: 00007635af5fdf60 R11: 0000000000000246 R12: 0000000000000000 [ 6315.969939] [ T52862] R13: 0000000000000000 R14: 00005a22f1a80520 R15: 00005a22f1a80518 [ 6315.974409] [ T52862] </TASK> [ 6315.977123] [ T52862] Modules linked in: cpuid xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_ad> [ 6316.018702] [ T52862] CR2: 00000000000000a0 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: linux-image-6.8.0-1012-aws 6.8.0-1012.13 ProcVersionSignature: Ubuntu 6.8.0-1012.13-aws 6.8.8 Uname: Linux 6.8.0-1012-aws x86_64 NonfreeKernelModules: zfs ApportVersion: 2.28.1-0ubuntu3 Architecture: amd64 CasperMD5CheckResult: unknown CloudArchitecture: x86_64 CloudBuildName: server CloudID: aws CloudName: aws CloudPlatform: ec2 CloudRegion: us-east-1 CloudSerial: 20220912 CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Aug 5 08:34:06 2024 Ec2Architecture: x86_64 Ec2Imageid: ami-08c40ec9ead489470 Ec2Instancetype: u-12tb1.112xlarge Ec2Region: us-east-1 ProcEnviron: LANG=C.UTF-8 LC_CTYPE=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color SourcePackage: linux-signed-aws UpgradeStatus: Upgraded to noble on 2024-06-06 (59 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-signed-aws/+bug/2076062/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp