I forgot to include that the reason we disabled both sysctls
(tcp_early_retrans and tcp_recovery) is because we initially just
disabled the TLP case by setting early retrans to 0, but then we hit a
second crash in the same spot inside of tcp_rearm_rto from a different
call path. This time in the REO case:
Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference,
address: 0000000000000020
Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode
Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page
Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0
Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI
Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G
W 5.4.0-174-generic #193-Ubuntu
Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8
64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d
c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8
0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246
Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf
RCX: 0000000000000000
Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001
RDI: ffff92d687ed8160
Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000
R09: 00000000cd896dcc
Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001
R12: ffff92d687ed8000
Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc
R15: 00000000cd8fca81
Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000)
GS:ffff93158ad40000(0000) knlGS:0000000000000000
Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006
CR4: 0000000000760ee0
Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554
Aug 7 07:26:16 rx [1006006.399717] Call Trace:
Aug 7 07:26:16 rx [1006006.402335]
Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f
Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9
Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380
Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0
Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30
Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20
Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450
Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90
Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0
Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40
Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160
Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30
Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670
Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220
Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450
Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10
Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0
Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20
Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0
Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50
Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250
Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0
Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140
Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250
Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0
Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40
Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130
Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core]
Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390
Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1
Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0
Aug 7 07:26:16 rx [1006006.539452] do_IRQ+0x5a/0xf0
Aug 7 07:26:16 rx [1006006.542590] common_interrupt+0xf/0xf
Aug 7 07:26:16 rx [1006006.546421]
Aug 7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10
Aug 7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90
e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c
50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
Aug 7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246
ORIG_RAX: ffffffffffffffc2
Aug 7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005
RCX: 0000000000000001
Aug 7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082
RDI: 0000000000000082
Aug 7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000
R09: 000000000000020f
Aug 7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000
R12: 0000000000000005
Aug 7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000
R15: 0000000000000000
Aug 7 07:26:16 rx [1006006.616530] ? __cpuidle_text_start+0x8/0x8
Aug 7 07:26:16 rx [1006006.620886] ? default_idle+0x20/0x140
Aug 7 07:26:16 rx [1006006.624804] arch_cpu_idle+0x15/0x20
Aug 7 07:26:16 rx [1006006.628545] default_idle_call+0x23/0x30
Aug 7 07:26:16 rx [1006006.632640] do_idle+0x1fb/0x270
Aug 7 07:26:16 rx [1006006.636035] cpu_startup_entry+0x20/0x30
Aug 7 07:26:16 rx [1006006.640126] start_secondary+0x178/0x1d0
Aug 7 07:26:16 rx [1006006.644218] secondary_startup_64+0xa4/0xb0
Aug 7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan
ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd
kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd
glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet
ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea
sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler
sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables
nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic
mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid
libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid]
Aug 7 07:26:17 rx [1006006.726180] CR2: 0000000000000020
Aug 7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]---
Also the proposed "workaround" patch I pasted in the original writeup is
not correct. I think something like the following may be more
appropriate:
Author: Josh Hunt <[email protected]>
Date: Tue Jul 30 19:45:43 2024 -0400
tcp: check skb is non-NULL before using in tcp_rto_delta_us()
There have been multiple occassions where we have crashed in this path
because
packets_out suggested there were packets on the write or retransmit queues,
but in fact there weren't leading to a NULL skb being dereferenced. While
we
should fix that root cause we should also just make sure the skb is not NULL
before dereferencing it. Also add a warn once here to capture some
information
if/when the problem case is hit again.
Signed-off-by: Josh Hunt <[email protected]>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aac11e7e1cc..932f0de641e4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2434,9 +2434,18 @@ static inline s64 tcp_rto_delta_us(const struct sock *sk)
{
const struct sk_buff *skb = tcp_rtx_queue_head(sk);
u32 rto = inet_csk(sk)->icsk_rto;
- u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) +
jiffies_to_usecs(rto);
- return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+ if (likely(skb)) {
+ u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) +
jiffies_to_usecs(rto);
+
+ return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+ } else {
+ WARN_ONCE(1,
+ "rtx queue emtpy: inflight %u tlp_high_seq %u state
%u\n",
+ tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq,
sk->sk_state);
+ return jiffies_to_usecs(rto);
+ }
+
}
/*
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2077657
Title:
Kernel Oops - BUG: kernel NULL pointer dereference, RIP:
0010:tcp_rearm_rto+0xe4/0x160
Status in linux package in Ubuntu:
Confirmed
Bug description:
We've recently started seeing the following crash on a # of machines
we have running a ceph cluster. They are all running Ubuntu 20.04.6
LTS.
Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference,
address: 0000000000000020
Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel
mode
Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present
page
Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0
Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI
Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1
Tainted: G W 5.4.0-174-generic #193-Ubuntu
Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26
os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d
5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef
e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7
e3
Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS:
00010246
Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX:
20c49ba5e353f7cf RCX: 0000000000000000
Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI:
000000000000231c RDI: ffff9874ad283a60
Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08:
0000000000000000 R09: ffff987605e20aa8
Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11:
ffffb75d4460f740 R12: ffff9874ad283900
Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14:
ffff9874ad283980 R15: ffff9874ad283d30
Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000)
GS:ffff987605e00000(0000) knlGS:0000000000000000
Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3:
0000003e450ba003 CR4: 0000000000760ef0
Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554
Jul 26 15:05:02 rx [11061395.916786] Call Trace:
Jul 26 15:05:02 rx [11061395.919488]
Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f
Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9
Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380
Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50
Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0
Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20
Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450
Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140
Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90
Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0
Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40
Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220
Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240
Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0
Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240
Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130
Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280
Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10
Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30
Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30
Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0
Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50
Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1
Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40
Jul 26 15:05:02 rx [11061396.039480]
Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50
Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60
Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20
Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0
[nf_conntrack]
Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack]
Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670
[nf_conntrack]
Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20
[nf_conntrack]
Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0
Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0
Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50
Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70
Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200
Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0
Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30
Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20
Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0
Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100
Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100
Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0
Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50
Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70
Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70
Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280
Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0
Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0
Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460
Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80
Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80
Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0
Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30
Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190
Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d
Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89
7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e
00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff
48
Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS:
00000293 ORIG_RAX: 000000000000002e
Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX:
000000000000084b RCX: 00007f1ef692618d
Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI:
00007f1ef4a26b10 RDI: 0000000000000275
Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08:
0000000000000000 R09: 0000000000000020
Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11:
0000000000000293 R12: 000000000000084b
Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14:
0000000000000275 R15: 000055592030f1e8
Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc
vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd
kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd
glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet
mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea
sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct
sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables
nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core
hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci
hid nvme i2c_piix4 nvme_core wmi
Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020
Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]---
Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d
5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef
e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7
e3
Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS:
00010246
Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX:
20c49ba5e353f7cf RCX: 0000000000000000
Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI:
000000000000231c RDI: ffff9874ad283a60
Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08:
0000000000000000 R09: ffff987605e20aa8
Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11:
ffffb75d4460f740 R12: ffff9874ad283900
Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14:
ffff9874ad283980 R15: ffff9874ad283d30
Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000)
GS:ffff987605e00000(0000) knlGS:0000000000000000
Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3:
0000003e450ba003 CR4: 0000000000760ef0
Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554
Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal
exception in interrupt
Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from
0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing:
Fatal
exception in interrupt ]---
Earlier in the day we also saw the following WARNING get triggered:
Jul 26 09:15:27 kernel: ------------[ cut here ]------------
Jul 26 09:15:27 kernel: invalid inflight: 2 state 1 cwnd 68 mss 8988
Jul 26 09:15:27 kernel: WARNING: CPU: 16 PID: 0 at
net/ipv4/tcp_output.c:2605 tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: Modules linked in: vrf bridge stp llc vxlan
ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd
kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd
glue_helper wmi_bmof ipmi_ssif joydev input_leds rndis_host cdc_ether usbnet
mii ast drm_vram_helper ttm drm_kms_he>
Jul 26 09:15:27 kernel: CPU: 16 PID: 0 Comm: swapper/16 Not tainted
5.4.0-174-generic #193-Ubuntu
Jul 26 09:15:27 kernel: Hardware name: Supermicro SMC 2x26 os-gen8 64C
NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Jul 26 09:15:27 kernel: RIP: 0010:tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: Code: 08 26 01 00 75 e2 41 0f b6 54 24 12 41 8b 8c
24 c0 06 00 00 45 89 f0 48 c7 c7 e0 b4 20 a7 c6 05 8d 08 26 01 01 e8 4a c0 0f
00 <0f> 0b eb ba 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
Jul 26 09:15:27 kernel: RSP: 0018:ffffb7838088ce00 EFLAGS: 00010286
Jul 26 09:15:27 kernel: RAX: 0000000000000000 RBX: ffff9b84b5630430 RCX:
0000000000000006
Jul 26 09:15:27 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI:
ffff9b8e4621c8c0
Jul 26 09:15:27 kernel: RBP: ffffb7838088ce18 R08: 0000000000000927 R09:
0000000000000004
Jul 26 09:15:27 kernel: R10: 0000000000000000 R11: 0000000000000001 R12:
ffff9b84b5630000
Jul 26 09:15:27 kernel: R13: 0000000000000000 R14: 000000000000231c R15:
ffff9b84b5630430
Jul 26 09:15:27 kernel: FS: 0000000000000000(0000)
GS:ffff9b8e46200000(0000) knlGS:0000000000000000
Jul 26 09:15:27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 09:15:27 kernel: CR2: 000056238cec2380 CR3: 0000003e49ede005 CR4:
0000000000760ee0
Jul 26 09:15:27 kernel: PKRU: 55555554
Jul 26 09:15:27 kernel: Call Trace:
Jul 26 09:15:27 kernel: <IRQ>
Jul 26 09:15:27 kernel: ? show_regs.cold+0x1a/0x1f
Jul 26 09:15:27 kernel: ? __warn+0x98/0xe0
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? report_bug+0xd1/0x100
Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
Jul 26 09:15:27 kernel: ? recalibrate_cpu_khz+0x10/0x10
Jul 26 09:15:27 kernel: ? ktime_get+0x3e/0xa0
Jul 26 09:15:27 kernel: ? native_x2apic_icr_write+0x30/0x30
Jul 26 09:15:27 kernel: run_timer_softirq+0x2a/0x50
Jul 26 09:15:27 kernel: __do_softirq+0xd1/0x2c1
Jul 26 09:15:27 kernel: irq_exit+0xae/0xb0
Jul 26 09:15:27 kernel: smp_apic_timer_interrupt+0x7b/0x140
Jul 26 09:15:27 kernel: apic_timer_interrupt+0xf/0x20
Jul 26 09:15:27 kernel: </IRQ>
Jul 26 09:15:27 kernel: RIP: 0010:native_safe_halt+0xe/0x10
Jul 26 09:15:27 kernel: Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00
00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb
f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
Jul 26 09:15:27 kernel: RSP: 0018:ffffb783801cfe70 EFLAGS: 00000246
ORIG_RAX: ffffffffffffff13
Jul 26 09:15:27 kernel: RAX: ffffffffa6908b20 RBX: 0000000000000010 RCX:
0000000000000001
Jul 26 09:15:27 kernel: RDX: 000000006fc0c97e RSI: 0000000000000082 RDI:
0000000000000082
Jul 26 09:15:27 kernel: RBP: ffffb783801cfe90 R08: 0000000000000000 R09:
0000000000000225
Jul 26 09:15:27 kernel: R10: 0000000000100000 R11: 0000000000000000 R12:
0000000000000010
Jul 26 09:15:27 kernel: R13: ffff9b8e390b0000 R14: 0000000000000000 R15:
0000000000000000
Jul 26 09:15:27 kernel: ? __cpuidle_text_start+0x8/0x8
Jul 26 09:15:27 kernel: ? default_idle+0x20/0x140
Jul 26 09:15:27 kernel: arch_cpu_idle+0x15/0x20
Jul 26 09:15:27 kernel: default_idle_call+0x23/0x30
Jul 26 09:15:27 kernel: do_idle+0x1fb/0x270
Jul 26 09:15:27 kernel: cpu_startup_entry+0x20/0x30
Jul 26 09:15:27 kernel: start_secondary+0x178/0x1d0
Jul 26 09:15:27 kernel: secondary_startup_64+0xa4/0xb0
Jul 26 09:15:27 kernel: ---[ end trace e7ac822987e33be1 ]---
The warning suggests there is an inconsistency in the accounting of
tp->packets_out where the kernel has a non-zero value for the counter,
but there are no packets in the transmit or retrans queues.
I grabbed the debug symbols for this kernel and looked up the RIP of
where we're seeing the null ptr deref. It points to here:
(gdb) list *(tcp_rearm_rto+0xe4)
0xffffffff819a2d64 is in tcp_rearm_rto
(/build/linux-G34Rii/linux-5.4.0/include/linux/math64.h:126).
warning: Source file is more recent than executable.
121 */
122 #ifndef div_u64
123 static inline u64 div_u64(u64 dividend, u32 divisor)
124 {
125 u32 remainder;
126 return div_u64_rem(dividend, divisor, &remainder);
127 }
128 #endif
129
130 /**
Digging into the objdump we can see 0xffffffff819a2d64 here:
ffffffff819a2d5c: 4c 89 ef mov %r13,%rdi
ffffffff819a2d5f: e8 0c fe 0e 00 callq ffffffff81a92b70
<rb_first>
>>> ffffffff819a2d64: 48 8b 78 20 mov 0x20(%rax),%rdi
ffffffff819a2d68: 48 c1 ef 03 shr $0x3,%rdi
ffffffff819a2d6c: 48 89 f8 mov %rdi,%rax
u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) +
jiffies_to_usecs(rto);
ffffffff819a2d6f: 41 8b bc 24 80 04 00 mov 0x480(%r12),%edi
ffffffff819a2d76: 00
ffffffff819a2d77: 48 f7 e3 mul %rbx
ffffffff819a2d7a: 48 89 d3 mov %rdx,%rbx
ffffffff819a2d7d: e8 9e 3a 78 ff callq ffffffff81126820
<jiffies_to_usecs>
which is part of this code:
/* At how many usecs into the future should the RTO fire? */
static inline s64 tcp_rto_delta_us(const struct sock *sk)
{
const struct sk_buff *skb = tcp_rtx_queue_head(sk);
u32 rto = inet_csk(sk)->icsk_rto;
>>> u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) +
jiffies_to_usecs(rto);
and eventually we see that tcp_skb_timestamp_us() looks like this:
/* provide the departure time in us unit */
static inline u64 tcp_skb_timestamp_us(const struct sk_buff *skb)
{
return div_u64(skb->skb_mstamp_ns, NSEC_PER_USEC);
}
This points us back to the null ptr deref coming in the call to
div_u64() as the original gdb disassembly showed. The failing insn is
"mov 0x20(%rax),%rdi" and skb_mstamp_ns is at offset 0x20 of struct
sk_buff:
$ pahole sk_buff /usr/lib/debug/boot/vmlinux-5.4.0-174-generic
struct sk_buff {
union {
struct {
struct sk_buff * next; /* 0 8 */
struct sk_buff * prev; /* 8 8 */
union {
struct net_device * dev; /* 16 8 */
long unsigned int dev_scratch; /* 16 8 */
}; /* 16 8 */
}; /* 0 24 */
struct rb_node rbnode; /* 0 24 */
struct list_head list; /* 0 16 */
}; /* 0 24 */
union {
struct sock * sk; /* 24 8 */
int ip_defrag_offset; /* 24 4 */
}; /* 24 8 */
union {
ktime_t tstamp; /* 32 8 */
u64 skb_mstamp_ns; /* 32 8 */
With this we now know that the skb was NULL which triggered the crash.
This appears to b/c this call gave us a NULL skb:
const struct sk_buff *skb = tcp_rtx_queue_head(sk);
meaning the retransmit queue was empty.
The same issue was hit a # of years ago as reported here:
https://lore.kernel.org/netdev/[email protected]/T/#t
with the following commit intended to fix it:
commit bffd168c3fc5cc7d2bad4c668fa90e7a9010db4b
Author: Soheil Hassas Yeganeh <[email protected]>
Date: Sat Apr 14 20:44:46 2018 -0400
tcp: clear tp->packets_out when purging write queue
however that commit is in the kernel version which saw this crash.
$ git describe --contains bffd168c3fc5cc7d2bad4c668fa90e7a9010db4b
v5.0~2050^2~28
We've been able to work around the problem by setting
tcp_early_retrans and tcp_recovery to 0 that way we won't hit the
REO_TIMEOUT and LOSS_PROBE states and so will not call
tcp_rto_delta_us():
if (!tp->packets_out) {
inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS);
} else {
u32 rto = inet_csk(sk)->icsk_rto;
/* Offset the time elapsed after installing regular RTO */
if (icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT ||
icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
s64 delta_us = tcp_rto_delta_us(sk);
As mentioned earlier there seems to be a problem with the packets_out
accounting here still which causes the kernel to think there are
either still packets in the write or retransmit queues. Additionally I
feel like we should probably add a patch in tcp_rto_delta_us() to
check that the skb is non-null b/f trying to read the timestamp.
Possibly something like this:
Author: Josh Hunt <[email protected]>
Date: Tue Jul 30 19:45:43 2024 -0400
workaround empty rtx queue
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aac11e7e1cc..d1e2ecbce536 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2434,7 +2434,8 @@ static inline s64 tcp_rto_delta_us(const struct sock
*sk)
{
const struct sk_buff *skb = tcp_rtx_queue_head(sk);
u32 rto = inet_csk(sk)->icsk_rto;
- u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) +
jiffies_to_usecs(rto);
+ u64 rto_time_stamp_us = skb ? tcp_skb_timestamp_us(skb) : 0 +
jiffies_to_usecs(rto);
+ WARN_ON_ONCE(!skb);
return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
}
I tried looking to see if there are any new commits upstream either in
Ubuntu's 5.4 kernels or upstream's net-next branch which could resolve
this problem, but did not find anything. We've also tried reproducing
this in a lab environment without success so far.
Please let me know if you need any other information to help root
cause this issue.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp