** Description changed:

+ [SRU Justification]
+ 
+ = Impact =
+ 
+ A potential race between cancelling offloaded traffic timeouts on busy
+ systems and those timeouts triggering could potentially crash the
+ system.
+ 
+ = Fix =
+ 
+ Picking a patch (and its pre-req which just moves code from local code
+ into a header) that sets sufficiently large timeout values to prevent
+ those from accidentally triggering will solve the problem.
+ 
+ = Testcase =
+ 
+ See original description below.
+ 
+ = Regression Potential =
+ 
+ If those large timeouts never happen (from the code description those
+ are set to days) and are not stopped by the offload functions, this
+ could lead to stuck traffic and possibly running out of buffers/memory.
+ 
+ --- original description ---
+ 
  Configuring CT offload with OVS and running stress http traffic that
  opens conns, send short data and close the conns. there is a race that
  could potentially crash the system.
  
  X86 side:
  
  /etc/init.d/openibd restart
  
  ifconfig $1 up
  ifconfig $2 up
  
  tc qdisc del dev $1 ingress
  tc qdisc del dev $2 ingress
  
  sleep 5
  
  tc qdisc add dev $1 ingress
  tc qdisc add dev $2 ingress
  
  tc filter add dev $1  protocol all parent ffff: flower action mirred egress 
redirect dev $2
  tc filter add dev $2  protocol all parent ffff: flower action mirred egress 
redirect dev $1
  
  ip l set dev $1 promisc on
  ip l set dev $2 promisc on
  arm side:
  
  ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
  
  service openvswitch restart
  
  for br in `ovs-vsctl list-br`;
  do
-         ovs-vsctl del-br $br
+         ovs-vsctl del-br $br
  done
  
  ovs-vsctl add-br ovsbr1
  ovs-vsctl add-port ovsbr1 p0
  ovs-vsctl add-port ovsbr1 pf0hpf
  
  ovs-vsctl add-br ovsbr2
  ovs-vsctl add-port ovsbr2 p1
  ovs-vsctl add-port ovsbr2 pf1hpf
  
  ovs-ofctl del-flows ovsbr1
  ovs-ofctl add-flow ovsbr1 arp,actions=normal
  ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=ct(table=1)"
  ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+new actions=ct(, 
commit),normal"
  ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+est actions=normal"
  
- 
- 
- 
- 
  # ovs-vsctl show
  9b68adbd-406b-4f72-8b4c-312d9379b8b9
-     Bridge ovsbr2
-         Port ovsbr2
-             Interface ovsbr2
-                 type: internal
-         Port pf1hpf
-             Interface pf1hpf
-         Port p1
-             Interface p1
-     Bridge ovsbr1
-         Port p0
-             Interface p0
-         Port ovsbr1
-             Interface ovsbr1
-                 type: internal
-         Port pf0hpf
-             Interface pf0hpf
-     ovs_version: "2.14.1"
-  dmesg:
+     Bridge ovsbr2
+         Port ovsbr2
+             Interface ovsbr2
+                 type: internal
+         Port pf1hpf
+             Interface pf1hpf
+         Port p1
+             Interface p1
+     Bridge ovsbr1
+         Port p0
+             Interface p0
+         Port ovsbr1
+             Interface ovsbr1
+                 type: internal
+         Port pf0hpf
+             Interface pf0hpf
+     ovs_version: "2.14.1"
+  dmesg:
  
-  1285.179728] Failed to associated timeout policy `ovs_test_tp'               
                                                                                
                                                                            
- [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000004c                                                        
                                                                                
- [ 1587.430043] Mem abort info:                                                
                                                                                
                                                                             
- [ 1587.432929]   ESR = 0x96000004                                             
                                                                                
                                                                             
- [ 1587.436025]   EC = 0x25: DABT (current EL), IL = 32 bits                   
                                                                                
                                                                             
- [ 1587.421221] Unable to handle k[ 1587.441377]   SET = 0, FnV = 0            
                                                                                
                                                                             
- ernel NULL pointer dereference a[ 1587.447279]   EA = 0, S1PTW = 0            
                                                                                
                                                                             
- t virtual address 000000000000004[ 1587.453188] Data abort info:              
                                                                                
                                                                             
- c                                                                             
                                                                                
                                                                             
- [ 1587.458924]   ISV = 0, ISS = 0x00000004                                    
                                                                                
                                                                             
- [ 1587.462977]   CM = 0, WnR = 0                                              
                                                                                
                                                                             
- [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000      
                                                                                
                                                                             
- [ 1587.472420] [000000000000004c] pgd=0000000000000000                        
                                                                                
                                                                             
- [ 1587.430043] Mem abort info:                                                
                                                                                
                                                                             
- [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP                
                                                                                
                                                                             
+  1285.179728] Failed to associated timeout policy `ovs_test_tp'
+ [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000004c
+ [ 1587.430043] Mem abort info:
+ [ 1587.432929]   ESR = 0x96000004
+ [ 1587.436025]   EC = 0x25: DABT (current EL), IL = 32 bits
+ [ 1587.421221] Unable to handle k[ 1587.441377]   SET = 0, FnV = 0
+ ernel NULL pointer dereference a[ 1587.447279]   EA = 0, S1PTW = 0
+ t virtual address 000000000000004[ 1587.453188] Data abort info:
+ c
+ [ 1587.458924]   ISV = 0, ISS = 0x00000004
+ [ 1587.462977]   CM = 0, WnR = 0
+ [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000
+ [ 1587.472420] [000000000000004c] pgd=0000000000000000
+ [ 1587.430043] Mem abort info:
+ [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
  [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack 
xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat 
bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp 
mrp llc ovk
- [ 1587.432929]   ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: 
kworker/2:3 Tainted: G           OE     5.4.0-1007-bluefield #10-Ubuntu         
                                                                                
    
- [ 1587.578523] Hardware name: https://www.mellanox.com BlueField 
SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar  5 2021                 
                                                                                
          
-                                                                               
                                                                                
                                                                             
- [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000006c                                                        
                                                                                
- [ 1587.589851] Workqueue: events rht_deferred_worker                          
                                                                                
                                                                             
- [ 1587.436025]   EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif 
-PAN -UAO)                                                                      
                                                                               
- (current EL), IL = 32 bits                                                    
                                                                                
                                                                             
- [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410                   
                                                                                
                                                                             
- 7.441377]   SET = 0, FnV = 0                                                  
                                                                                
                                                                             
- [ 1587.447279]   EA = 0, S1PTW = 0                                            
                                                                                
                                                                             
- [ 1587.453188] Data abort info:                                               
                                                                                
                                                                             
- [ 1587.458924]   ISV = 0, ISS = 0x00000004                                    
                                                                                
                                                                             
- [ 1587.462977]   CM = 0, WnR = 0                                              
                                                                                
                                                                             
- [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000      
                                                                                
                                                                             
- [ 1587.472420] [000000000000004c] pgd=0000000000000000                        
                                                                                
                                                                             
- [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP                
                                                                                
                                                                             
+ [ 1587.432929]   ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: 
kworker/2:3 Tainted: G           OE     5.4.0-1007-bluefield #10-Ubuntu
+ [ 1587.578523] Hardware name: https://www.mellanox.com BlueField 
SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar  5 2021
+ 
+ [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000006c
+ [ 1587.589851] Workqueue: events rht_deferred_worker
+ [ 1587.436025]   EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif 
-PAN -UAO)
+ (current EL), IL = 32 bits
+ [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410
+ 7.441377]   SET = 0, FnV = 0
+ [ 1587.447279]   EA = 0, S1PTW = 0
+ [ 1587.453188] Data abort info:
+ [ 1587.458924]   ISV = 0, ISS = 0x00000004
+ [ 1587.462977]   CM = 0, WnR = 0
+ [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000
+ [ 1587.472420] [000000000000004c] pgd=0000000000000000
+ [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
  [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack 
xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat 
bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp 
mrp llc ovk
- [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G           OE     
5.4.0-1007-bluefield #10-Ubuntu                                                 
                                                                             
- [ 1587.578523] Hardware name: https://www.mellanox.com BlueField 
SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar  5 2021                 
                                                                                
          
- [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000006c                                                        
                                                                                
- [ 1587.589851] Workqueue: events rht_deferred_worker                          
                                                                                
                                                                             
- [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)                         
                                                                                
                                                                             
- [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410                        
                                                                                
                                                                             
- [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410                       
                                                                                
                                                                             
- [ 1587.589862] sp : ffff800013ebbcf0                                          
                                                                                
                                                                             
- [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0                    
                                                                                
                                                                             
- [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000                    
                                                                                
                                                                             
- [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000                    
                                                                                
                                                                             
- [ 1587.598798] Mem abort info:                                                
                                                                                
                                                                             
- [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e                    
                                                                                
                                                                             
- [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400                    
                                                                                
                                                                             
- [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000                    
                                                                                
                                                                             
- [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27                    
                                                                                
                                                                             
- [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000                    
                                                                                
                                                                             
- [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000                    
                                                                                
                                                                             
- [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001                    
                                                                                
                                                                             
- [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000                    
                                                                                
                                                                             
- [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301                    
                                                                                
                                                                             
+ [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G           OE     
5.4.0-1007-bluefield #10-Ubuntu
+ [ 1587.578523] Hardware name: https://www.mellanox.com BlueField 
SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar  5 2021
+ [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual 
address 000000000000006c
+ [ 1587.589851] Workqueue: events rht_deferred_worker
+ [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)
+ [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410
+ [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410
+ [ 1587.589862] sp : ffff800013ebbcf0
+ [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
+ [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
+ [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
+ [ 1587.598798] Mem abort info:
+ [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
+ [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
+ [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
+ [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
+ [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
+ [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
+ [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
+ [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
+ [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
  [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
  [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
  [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
  [ 1587.603505] Call trace:
  [ 1587.603515]  rhashtable_rehash_table+0xfc/0x410
  [ 1587.603517]  rht_deferred_worker+0x18c/0x298
  [ 1587.603523]  process_one_work+0x1c4/0x480
  [ 1587.603531]  worker_thread+0x54/0x430
  [ 1587.603533]  kthread+0x138/0x150
  [ 1587.603537]  ret_from_fork+0x10/0x1c
  [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
  [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
  [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
  [ 1587.611162]   ESR = 0x96000004
  [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410
  [ 1587.589862] sp : ffff800013ebbcf0
  [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
  [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
  [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
  [ 1587.598798] Mem abort info:
  [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
  [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
  [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
  [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
  [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
  [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
  [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
  [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
  [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
  [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
  [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
  [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
  [ 1587.603505] Call trace:
  [ 1587.603515]  rhashtable_rehash_table+0xfc/0x410
  [ 1587.603517]  rht_deferred_worker+0x18c/0x298
  [ 1587.603523]  process_one_work+0x1c4/0x480
  [ 1587.603531]  worker_thread+0x54/0x430
  [ 1587.603533]  kthread+0x138/0x150
  [ 1587.603537]  ret_from_fork+0x10/0x1c
  [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
  [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
  [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
  [ 1587.611162]   ESR = 0x96000004
  [ 1587.911485] SMP: stopping secondary CPUs
  [ 1587.911541] Kernel Offset: disabled
  [ 1587.911545] CPU features: 0x0002,20006008
  [ 1587.911547] Memory Limit: none
  [ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt ]---

** Changed in: linux-bluefield (Ubuntu Focal)
       Status: Triaged => In Progress

** Changed in: linux-bluefield (Ubuntu Focal)
     Assignee: (unassigned) => Roi Dayan (roidayan)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1922672

Title:
  kernel crash with stress CT offload traffic

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to