I forgot to include that the reason we disabled both sysctls
(tcp_early_retrans and tcp_recovery) is because we initially just
disabled the TLP case by setting early retrans to 0, but then we hit a
second crash in the same spot inside of tcp_rearm_rto from a different
call path. This time in the REO case:

Aug  7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, 
address: 0000000000000020
Aug  7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode
Aug  7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page
Aug  7 07:26:16 rx [1006006.283343] PGD 0 P4D 0 
Aug  7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI
Aug  7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G    
    W         5.4.0-174-generic #193-Ubuntu
Aug  7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 
64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Aug  7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Aug  7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d 
c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 
0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
Aug  7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246
Aug  7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf 
RCX: 0000000000000000
Aug  7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 
RDI: ffff92d687ed8160
Aug  7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 
R09: 00000000cd896dcc
Aug  7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 
R12: ffff92d687ed8000
Aug  7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc 
R15: 00000000cd8fca81
Aug  7 07:26:16 rx [1006006.375381] FS:  0000000000000000(0000) 
GS:ffff93158ad40000(0000) knlGS:0000000000000000
Aug  7 07:26:16 rx [1006006.383632] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Aug  7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 
CR4: 0000000000760ee0
Aug  7 07:26:16 rx [1006006.396839] PKRU: 55555554
Aug  7 07:26:16 rx [1006006.399717] Call Trace:
Aug  7 07:26:16 rx [1006006.402335]  
Aug  7 07:26:16 rx [1006006.404525]  ? show_regs.cold+0x1a/0x1f
Aug  7 07:26:16 rx [1006006.408532]  ? __die+0x90/0xd9
Aug  7 07:26:16 rx [1006006.411760]  ? no_context+0x196/0x380
Aug  7 07:26:16 rx [1006006.415599]  ? __bad_area_nosemaphore+0x50/0x1a0
Aug  7 07:26:16 rx [1006006.420392]  ? _raw_spin_lock+0x1e/0x30
Aug  7 07:26:16 rx [1006006.424401]  ? bad_area_nosemaphore+0x16/0x20
Aug  7 07:26:16 rx [1006006.428927]  ? do_user_addr_fault+0x267/0x450
Aug  7 07:26:16 rx [1006006.433450]  ? __do_page_fault+0x58/0x90
Aug  7 07:26:16 rx [1006006.437542]  ? do_page_fault+0x2c/0xe0
Aug  7 07:26:16 rx [1006006.441470]  ? page_fault+0x34/0x40
Aug  7 07:26:16 rx [1006006.445134]  ? tcp_rearm_rto+0xe4/0x160
Aug  7 07:26:16 rx [1006006.449145]  tcp_ack+0xa32/0xb30
Aug  7 07:26:16 rx [1006006.452542]  tcp_rcv_established+0x13c/0x670
Aug  7 07:26:16 rx [1006006.456981]  ? sk_filter_trim_cap+0x48/0x220
Aug  7 07:26:16 rx [1006006.461419]  tcp_v6_do_rcv+0xdb/0x450
Aug  7 07:26:16 rx [1006006.465257]  tcp_v6_rcv+0xc2b/0xd10
Aug  7 07:26:16 rx [1006006.468918]  ip6_protocol_deliver_rcu+0xd3/0x4e0
Aug  7 07:26:16 rx [1006006.473706]  ip6_input_finish+0x15/0x20
Aug  7 07:26:16 rx [1006006.477710]  ip6_input+0xa2/0xb0
Aug  7 07:26:16 rx [1006006.481109]  ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
Aug  7 07:26:16 rx [1006006.486151]  ip6_sublist_rcv_finish+0x3d/0x50
Aug  7 07:26:16 rx [1006006.490679]  ip6_sublist_rcv+0x1aa/0x250
Aug  7 07:26:16 rx [1006006.494779]  ? ip6_rcv_finish_core.isra.0+0xa0/0xa0
Aug  7 07:26:16 rx [1006006.499828]  ipv6_list_rcv+0x112/0x140
Aug  7 07:26:16 rx [1006006.503748]  __netif_receive_skb_list_core+0x1a4/0x250
Aug  7 07:26:16 rx [1006006.509057]  netif_receive_skb_list_internal+0x1a1/0x2b0
Aug  7 07:26:16 rx [1006006.514538]  gro_normal_list.part.0+0x1e/0x40
Aug  7 07:26:16 rx [1006006.519068]  napi_complete_done+0x91/0x130
Aug  7 07:26:16 rx [1006006.523352]  mlx5e_napi_poll+0x18e/0x610 [mlx5_core]
Aug  7 07:26:16 rx [1006006.528481]  net_rx_action+0x142/0x390
Aug  7 07:26:16 rx [1006006.532398]  __do_softirq+0xd1/0x2c1
Aug  7 07:26:16 rx [1006006.536142]  irq_exit+0xae/0xb0
Aug  7 07:26:16 rx [1006006.539452]  do_IRQ+0x5a/0xf0
Aug  7 07:26:16 rx [1006006.542590]  common_interrupt+0xf/0xf
Aug  7 07:26:16 rx [1006006.546421]  
Aug  7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10
Aug  7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 
e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 
50 00 fb f4  90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
Aug  7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246 
ORIG_RAX: ffffffffffffffc2
Aug  7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005 
RCX: 0000000000000001
Aug  7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082 
RDI: 0000000000000082
Aug  7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000 
R09: 000000000000020f
Aug  7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000 
R12: 0000000000000005
Aug  7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000 
R15: 0000000000000000
Aug  7 07:26:16 rx [1006006.616530]  ? __cpuidle_text_start+0x8/0x8
Aug  7 07:26:16 rx [1006006.620886]  ? default_idle+0x20/0x140
Aug  7 07:26:16 rx [1006006.624804]  arch_cpu_idle+0x15/0x20
Aug  7 07:26:16 rx [1006006.628545]  default_idle_call+0x23/0x30
Aug  7 07:26:16 rx [1006006.632640]  do_idle+0x1fb/0x270
Aug  7 07:26:16 rx [1006006.636035]  cpu_startup_entry+0x20/0x30
Aug  7 07:26:16 rx [1006006.640126]  start_secondary+0x178/0x1d0
Aug  7 07:26:16 rx [1006006.644218]  secondary_startup_64+0xa4/0xb0
Aug  7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan 
ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd 
kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd 
glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet 
ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea 
sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler 
sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables 
nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic 
mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid 
libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid]
Aug  7 07:26:17 rx [1006006.726180] CR2: 0000000000000020
Aug  7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]--- 

Also the proposed "workaround" patch I pasted in the original writeup is
not correct. I think something like the following may be more
appropriate:

Author: Josh Hunt <joh...@akamai.com>
Date:   Tue Jul 30 19:45:43 2024 -0400

    tcp: check skb is non-NULL before using in tcp_rto_delta_us()
    
    There have been multiple occassions where we have crashed in this path 
because
    packets_out suggested there were packets on the write or retransmit queues,
     but in fact there weren't leading to a NULL skb being dereferenced. While 
we
    should fix that root cause we should also just make sure the skb is not NULL
    before dereferencing it. Also add a warn once here to capture some 
information
    if/when the problem case is hit again.
    
    Signed-off-by: Josh Hunt <joh...@akamai.com>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aac11e7e1cc..932f0de641e4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2434,9 +2434,18 @@ static inline s64 tcp_rto_delta_us(const struct sock *sk)
 {
        const struct sk_buff *skb = tcp_rtx_queue_head(sk);
        u32 rto = inet_csk(sk)->icsk_rto;
-       u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + 
jiffies_to_usecs(rto);
 
-       return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+       if (likely(skb)) {
+               u64 rto_time_stamp_us = tcp_skb_timestamp_us(skb) + 
jiffies_to_usecs(rto);
+
+               return rto_time_stamp_us - tcp_sk(sk)->tcp_mstamp;
+       } else {
+               WARN_ONCE(1,
+                       "rtx queue emtpy: inflight %u tlp_high_seq %u state 
%u\n",
+                       tcp_sk(sk)->packets_out, tcp_sk(sk)->tlp_high_seq, 
sk->sk_state);
+               return jiffies_to_usecs(rto);
+       }
+
 }
 
 /*

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077657

Title:
  Kernel Oops - BUG: kernel NULL pointer dereference, RIP:
  0010:tcp_rearm_rto+0xe4/0x160

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to