** Description changed: + [SRU Justification] + + = Impact = + + A potential race between cancelling offloaded traffic timeouts on busy + systems and those timeouts triggering could potentially crash the + system. + + = Fix = + + Picking a patch (and its pre-req which just moves code from local code + into a header) that sets sufficiently large timeout values to prevent + those from accidentally triggering will solve the problem. + + = Testcase = + + See original description below. + + = Regression Potential = + + If those large timeouts never happen (from the code description those + are set to days) and are not stopped by the offload functions, this + could lead to stuck traffic and possibly running out of buffers/memory. + + --- original description --- + Configuring CT offload with OVS and running stress http traffic that opens conns, send short data and close the conns. there is a race that could potentially crash the system. X86 side: /etc/init.d/openibd restart ifconfig $1 up ifconfig $2 up tc qdisc del dev $1 ingress tc qdisc del dev $2 ingress sleep 5 tc qdisc add dev $1 ingress tc qdisc add dev $2 ingress tc filter add dev $1 protocol all parent ffff: flower action mirred egress redirect dev $2 tc filter add dev $2 protocol all parent ffff: flower action mirred egress redirect dev $1 ip l set dev $1 promisc on ip l set dev $2 promisc on arm side: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true service openvswitch restart for br in `ovs-vsctl list-br`; do - ovs-vsctl del-br $br + ovs-vsctl del-br $br done ovs-vsctl add-br ovsbr1 ovs-vsctl add-port ovsbr1 p0 ovs-vsctl add-port ovsbr1 pf0hpf ovs-vsctl add-br ovsbr2 ovs-vsctl add-port ovsbr2 p1 ovs-vsctl add-port ovsbr2 pf1hpf ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 arp,actions=normal ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+new actions=ct(, commit),normal" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+est actions=normal" - - - - # ovs-vsctl show 9b68adbd-406b-4f72-8b4c-312d9379b8b9 - Bridge ovsbr2 - Port ovsbr2 - Interface ovsbr2 - type: internal - Port pf1hpf - Interface pf1hpf - Port p1 - Interface p1 - Bridge ovsbr1 - Port p0 - Interface p0 - Port ovsbr1 - Interface ovsbr1 - type: internal - Port pf0hpf - Interface pf0hpf - ovs_version: "2.14.1" - dmesg: + Bridge ovsbr2 + Port ovsbr2 + Interface ovsbr2 + type: internal + Port pf1hpf + Interface pf1hpf + Port p1 + Interface p1 + Bridge ovsbr1 + Port p0 + Interface p0 + Port ovsbr1 + Interface ovsbr1 + type: internal + Port pf0hpf + Interface pf0hpf + ovs_version: "2.14.1" + dmesg: - 1285.179728] Failed to associated timeout policy `ovs_test_tp' - [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c - [ 1587.430043] Mem abort info: - [ 1587.432929] ESR = 0x96000004 - [ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits - [ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0 - ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0 - t virtual address 000000000000004[ 1587.453188] Data abort info: - c - [ 1587.458924] ISV = 0, ISS = 0x00000004 - [ 1587.462977] CM = 0, WnR = 0 - [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 - [ 1587.472420] [000000000000004c] pgd=0000000000000000 - [ 1587.430043] Mem abort info: - [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP + 1285.179728] Failed to associated timeout policy `ovs_test_tp' + [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c + [ 1587.430043] Mem abort info: + [ 1587.432929] ESR = 0x96000004 + [ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits + [ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0 + ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0 + t virtual address 000000000000004[ 1587.453188] Data abort info: + c + [ 1587.458924] ISV = 0, ISS = 0x00000004 + [ 1587.462977] CM = 0, WnR = 0 + [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 + [ 1587.472420] [000000000000004c] pgd=0000000000000000 + [ 1587.430043] Mem abort info: + [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk - [ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu - [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 - - [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c - [ 1587.589851] Workqueue: events rht_deferred_worker - [ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) - (current EL), IL = 32 bits - [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 - 7.441377] SET = 0, FnV = 0 - [ 1587.447279] EA = 0, S1PTW = 0 - [ 1587.453188] Data abort info: - [ 1587.458924] ISV = 0, ISS = 0x00000004 - [ 1587.462977] CM = 0, WnR = 0 - [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 - [ 1587.472420] [000000000000004c] pgd=0000000000000000 - [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP + [ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu + [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 + + [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c + [ 1587.589851] Workqueue: events rht_deferred_worker + [ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) + (current EL), IL = 32 bits + [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 + 7.441377] SET = 0, FnV = 0 + [ 1587.447279] EA = 0, S1PTW = 0 + [ 1587.453188] Data abort info: + [ 1587.458924] ISV = 0, ISS = 0x00000004 + [ 1587.462977] CM = 0, WnR = 0 + [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 + [ 1587.472420] [000000000000004c] pgd=0000000000000000 + [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk - [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu - [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 - [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c - [ 1587.589851] Workqueue: events rht_deferred_worker - [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) - [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 - [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 - [ 1587.589862] sp : ffff800013ebbcf0 - [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 - [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 - [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 - [ 1587.598798] Mem abort info: - [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e - [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 - [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 - [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 - [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 - [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 - [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 - [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 - [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 + [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu + [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 + [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c + [ 1587.589851] Workqueue: events rht_deferred_worker + [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) + [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 + [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 + [ 1587.589862] sp : ffff800013ebbcf0 + [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 + [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 + [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 + [ 1587.598798] Mem abort info: + [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e + [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 + [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 + [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 + [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 + [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 + [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 + [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 + [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 [ 1587.589862] sp : ffff800013ebbcf0 [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 [ 1587.598798] Mem abort info: [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.911485] SMP: stopping secondary CPUs [ 1587.911541] Kernel Offset: disabled [ 1587.911545] CPU features: 0x0002,20006008 [ 1587.911547] Memory Limit: none [ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
** Changed in: linux-bluefield (Ubuntu Focal) Status: Triaged => In Progress ** Changed in: linux-bluefield (Ubuntu Focal) Assignee: (unassigned) => Roi Dayan (roidayan) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1922672 Title: kernel crash with stress CT offload traffic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs