Hi Philip,

The crash was happening on u-12tb1.112xlarge instances (448 cores, 12TB
RAM), and would happen once very few hours on average when the system
was busy.

Our systems worked fine for months after the upgrade to 24.04. Then we
started experiencing (different) crashes with the now retracted
6.8.0-1011. We went back to 1010 and these crashes appeared.

We upgraded Yesterday to 1012 on all the servers and have had zero
crashes since, so perhaps 1012 fixes this issue directly or indirectly.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-aws in Ubuntu.
https://bugs.launchpad.net/bugs/2076062

Title:
  6.8.0-1010-aws repeating UG: kernel NULL pointer dereference, address:
  00000000000000a0

Status in linux-signed-aws package in Ubuntu:
  New

Bug description:
  We have a repeating (every few hours) kernel OOPS: 
  BUG: kernel NULL pointer dereference, address: 00000000000000a0.

  This repeats on several different hardwares so this is not a hardware
  issue.

  Attached as a file are 5 dmesg logs and kdump reports.

  [ 6315.768576] [  T52862] #PF: supervisor read access in kernel mode
  [ 6315.770893] [  T52862] #PF: error_code(0x0000) - not-present page
  [ 6315.773266] [  T52862] PGD 0 P4D 0
  [ 6315.774528] [  T52862] Oops: 0000 [#1] SMP NOPTI
  [ 6315.776262] [  T52862] CPU: 399 PID: 52862 Comm: python Kdump: loaded 
Tainted: P           O       6.8.0-1010-aws #10-Ubuntu
  [ 6315.780647] [  T52862] Hardware name: Amazon EC2 u-12tb1.112xlarge/, BIOS 
1.0 10/16/2017
  [ 6315.783698] [  T52862] RIP: 0010:pick_next_task_fair+0x91/0x620
  [ 6315.785810] [  T52862] Code: 91 00 00 00 49 81 bd b0 02 00 00 00 3b 6f 91 
75 60 4d 89 fe eb 27 4c 89 f7 e8 5b b3 ff ff 84 c0 75 3f 4c 89 f7 e8 8f fb fe 
ff <4c> 8b b0 a0 00 00 00 48 89 c3 4d 85 f6 0f 8>
  [ 6315.793288] [  T52862] RSP: 0018:ffffbcf57feafb20 EFLAGS: 00010046
  [ 6315.795539] [  T52862] RAX: 0000000000000000 RBX: ffffbcf57feafbf0 RCX: 
0000000000000000
  [ 6315.798481] [  T52862] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
  [ 6315.801402] [  T52862] RBP: ffffbcf57feafb60 R08: 0000000000000000 R09: 
0000000000000000
  [ 6315.804358] [  T52862] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffffa64bb27b47c0
  [ 6315.807298] [  T52862] R13: ffff9ed3e4a88000 R14: ffffa64bb27b48c0 R15: 
ffffa64bb27b48c0
  [ 6315.810210] [  T52862] FS:  00007635af6006c0(0000) 
GS:ffffa64bb2780000(0000) knlGS:0000000000000000
  [ 6315.813512] [  T52862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 6315.817797] [  T52862] CR2: 00000000000000a0 CR3: 000003029b5dc002 CR4: 
00000000007706f0
  [ 6315.822588] [  T52862] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  [ 6315.827353] [  T52862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
  [ 6315.832121] [  T52862] PKRU: 55555554
  [ 6315.835228] [  T52862] Call Trace:
  [ 6315.838210] [  T52862]  <TASK>
  [ 6315.841100] [  T52862]  ? show_regs+0x6d/0x80
  [ 6315.844493] [  T52862]  ? __die+0x24/0x80
  [ 6315.847706] [  T52862]  ? page_fault_oops+0x99/0x1b0
  [ 6315.851278] [  T52862]  ? do_user_addr_fault+0x2d8/0x6a0
  [ 6315.855048] [  T52862]  ? exc_page_fault+0x83/0x190
  [ 6315.858600] [  T52862]  ? asm_exc_page_fault+0x27/0x30
  [ 6315.862258] [  T52862]  ? pick_next_task_fair+0x91/0x620
  [ 6315.865961] [  T52862]  pick_next_task+0x5f/0xce0
  [ 6315.869496] [  T52862]  __schedule+0x16e/0x790
  [ 6315.872845] [  T52862]  ? os_xsave+0x2e/0x70
  [ 6315.876108] [  T52862]  schedule+0x2c/0xf0
  [ 6315.879234] [  T52862]  futex_wait_queue+0x65/0xa0
  [ 6315.882626] [  T52862]  __futex_wait+0x155/0x1d0
  [ 6315.885917] [  T52862]  ? __pfx_futex_wake_mark+0x10/0x10
  [ 6315.889487] [  T52862]  futex_wait+0x74/0x120
  [ 6315.892690] [  T52862]  ? __pfx_hrtimer_wakeup+0x10/0x10
  [ 6315.896211] [  T52862]  do_futex+0x105/0x260
  [ 6315.899355] [  T52862]  __x64_sys_futex+0x12a/0x200
  [ 6315.902692] [  T52862]  ? __x64_sys_futex+0x12a/0x200
  [ 6315.906094] [  T52862]  x64_sys_call+0x1ce7/0x25c0
  [ 6315.909394] [  T52862]  do_syscall_64+0x7f/0x180
  [ 6315.912631] [  T52862]  ? do_syscall_64+0x8c/0x180
  [ 6315.915970] [  T52862]  ? switch_fpu_return+0x51/0xd0
  [ 6315.919359] [  T52862]  ? syscall_exit_to_user_mode+0x86/0x260
  [ 6315.923069] [  T52862]  ? do_syscall_64+0x8c/0x180
  [ 6315.926410] [  T52862]  entry_SYSCALL_64_after_hwframe+0x78/0x80
  [ 6315.930158] [  T52862] RIP: 0033:0x7635b3e98d61
  [ 6315.933439] [  T52862] Code: 48 89 4d c8 e8 00 f8 ff ff 4c 8b 55 c8 8b 75 
dc 45 31 c0 41 89 c5 48 8b 7d d0 41 b9 ff ff ff ff 44 89 e2 b8 ca 00 00 00 0f 
05 <44> 89 ef 48 89 c3 e8 54 f8 ff ff e9 65 ff f>
  [ 6315.945541] [  T52862] RSP: 002b:00007635af5fde10 EFLAGS: 00000246 
ORIG_RAX: 00000000000000ca
  [ 6315.951915] [  T52862] RAX: ffffffffffffffda RBX: 00005a22f1a804f0 RCX: 
00007635b3e98d61
  [ 6315.956420] [  T52862] RDX: 0000000000000000 RSI: 0000000000000089 RDI: 
00005a22f1a80518
  [ 6315.960936] [  T52862] RBP: 00007635af5fde50 R08: 0000000000000000 R09: 
00000000ffffffff
  [ 6315.965421] [  T52862] R10: 00007635af5fdf60 R11: 0000000000000246 R12: 
0000000000000000
  [ 6315.969939] [  T52862] R13: 0000000000000000 R14: 00005a22f1a80520 R15: 
00005a22f1a80518
  [ 6315.974409] [  T52862]  </TASK>
  [ 6315.977123] [  T52862] Modules linked in: cpuid xt_tcpudp veth 
xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_ad>
  [ 6316.018702] [  T52862] CR2: 00000000000000a0

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: linux-image-6.8.0-1012-aws 6.8.0-1012.13
  ProcVersionSignature: Ubuntu 6.8.0-1012.13-aws 6.8.8
  Uname: Linux 6.8.0-1012-aws x86_64
  NonfreeKernelModules: zfs
  ApportVersion: 2.28.1-0ubuntu3
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudBuildName: server
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSerial: 20220912
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Aug  5 08:34:06 2024
  Ec2Architecture: x86_64
  Ec2Imageid: ami-08c40ec9ead489470
  Ec2Instancetype: u-12tb1.112xlarge
  Ec2Region: us-east-1
  ProcEnviron:
   LANG=C.UTF-8
   LC_CTYPE=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
  SourcePackage: linux-signed-aws
  UpgradeStatus: Upgraded to noble on 2024-06-06 (59 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-signed-aws/+bug/2076062/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to