List corruption from ipv6_route_seq_start

2021-01-14 Thread stranche
Hi everyone, We've had a list corruption reported to us when using the /proc/net/ipv6_route file to read the routing information on the system on the 5.4.61 kernel. From the list pointers, it seems that the list_head in the fib6_walker has been reinitialized with INIT_LIST_HEAD() in ipv6_rout

Re: Refcount mismatch when unregistering netdevice from kernel

2021-01-04 Thread stranche
On 2020-12-11 09:10, David Ahern wrote: Could we further distinguish between dst added to the uncached list by icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are the ones leaking reference? I suspect it would be the xfrm ones, but I think it is worth verifying. After diggi

Re: Refcount mismatch when unregistering netdevice from kernel

2020-12-10 Thread stranche
BTW, have you tried your previous proposed patch and confirmed it would fix the issue? Yes, we shared this with the customer and the refcount mismatch still occurred, so this doesn't seem sufficient either. Could we further distinguish between dst added to the uncached list by icmp6_dst_all

Re: Refcount mismatch when unregistering netdevice from kernel

2020-12-08 Thread stranche
Hi Wei and Eric, Thanks for the replies. This was reported to us on the 5.4.61 kernel during a customer regression suite, so we don't have an exact reproducer unfortunately. From the trace logs we've added it seems like this is happening during IPv6 transport mode XFRM data transfer and the d

Refcount mismatch when unregistering netdevice from kernel

2020-12-07 Thread stranche
Hi everyone, We've recently been investigating a refcount problem when unregistering a netdevice from the kernel. It seems that the IPv6 module is still holding various references to the inet6_dev associated with the main netdevice struct that are not being released, preventing the unregistra

Re: [PATCH net] genetlink: take netlink table lock when (un)registering

2020-06-29 Thread stranche
On 2020-06-27 12:55, Cong Wang wrote: On Fri, Jun 26, 2020 at 5:32 PM Sean Tranchetti wrote: A potential deadlock can occur during registering or unregistering a new generic netlink family between the main nl_table_lock and the cb_lock where each thread wants the lock held by the other, as

Re: WARN_ON in TLP causing RT throttling

2018-09-28 Thread stranche
On 2018-09-27 18:25, Eric Dumazet wrote: On 09/27/2018 05:16 PM, stran...@codeaurora.org wrote: Hi Yuchung, Based on the dumps we were able to get, it appears that TFO was not used in this case. We also tried some local experiments where we dropped incoming SYN packets after already successf

Re: WARN_ON in TLP causing RT throttling

2018-09-27 Thread stranche
On 2018-09-27 13:14, Yuchung Cheng wrote: On Wed, Sep 26, 2018 at 5:09 PM, Eric Dumazet wrote: On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote: > Hi Eric, > > Someone recently reported a crash to us on the 4.14.62 kernel where excessive > WARNING prints were spamming the logs and causi

WARN_ON in TLP causing RT throttling

2018-09-26 Thread stranche
Hi Eric, Someone recently reported a crash to us on the 4.14.62 kernel where excessive WARNING prints were spamming the logs and causing watchdog bites. The kernel does have the following commit by Soheil: bffd168c3fc5 "tcp: clear tp->packets_out when purging write queue" Before this bug we s

Re: [PATCH net] af_key: free SKBs under RCU protection

2018-09-24 Thread stranche
On 2018-09-23 11:15, Eric Dumazet wrote: On 09/20/2018 12:25 PM, stran...@codeaurora.org wrote: Perhaps a cleaner solution here is to always clone the SKB in pfkey_broadcast_one(). That will ensure that the two kfree_skb() calls in pfkey_broadcast() will never be passed an SKB with sock_rfree()

Re: [PATCH net] af_key: free SKBs under RCU protection

2018-09-21 Thread stranche
On 2018-09-21 11:40, Eric Dumazet wrote: On 09/21/2018 10:09 AM, stran...@codeaurora.org wrote: I also tried reverting 7f6b9dbd5afb ("af_key: locking change") and running the test there and I still see the crash, so it doesn't seem to be an RCU specific issue. Is there anything else that cou

Re: [PATCH net] af_key: free SKBs under RCU protection

2018-09-21 Thread stranche
As long as one skb has sock_rfree has its destructor, the socket attached to this skb can not be released. There is no race here. Note that skb_clone() does not propagate the destructor. The issue here is that in the rcu lookup, we can find a socket that has been dismantled, with a 0 refcou

Re: [PATCH net] af_key: free SKBs under RCU protection

2018-09-20 Thread stranche
I do not believe the changelog or the patch makes sense. Having skb still referencing a socket prevents this socket being released. If you think about it, what would prevent the freeing happening _before_ the rcu_read_lock() in pfkey_broadcast() ? Maybe the correct fix is that pfkey_broadcas

Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path

2018-05-14 Thread stranche
On 2018-05-11 17:16, Willem de Bruijn wrote: Hmm, no, we absolutely need to fix GSO instead. Think of a bonding device (or any virtual devices), your patch wont avoid the crash. Hi Eric. Can you clarify what you mean by "fix GSO?" Is that just having the GSO path work regardless of whether