A few questions.
- First off, why am I not seeing the original mail in this thread even when I search the mail archives, e.g., https://lkml.org/lkml/2017/11/13/954 - Girish Moodalbail writes: > The issue here is that we are trying to reference a network namespace > (struct net *) that is long gone (i.e., L532 below -- c_net is the culprit). The netns is not "long gone", we are still processing the NETDEV_UNREGISTER_FINAL for loopback. As I said in my earlier mail, the idea is to extract the list of unique conns that belong to the netns and then destroy both the conn, and all associated paths. Thus there can only be a single thread going through rds_tcp_kill_sock at any time (since we should only get the unregister_final/loopback one time for the netns). (See alos comment block in rds_tcp_dev_event about network activity quiescing). Thus there should be no concurrency issue. However when I just ehecked this, there may be some rds connection refcounting bug. When I quickly tested this, I'm not seeing the expected calls to conn_path_destroy. I'll need some time to take a look, this has been known to work, so something got broken along the way > I think we should move away from global list to a per-namespace list. The > global list are used only in two places (both of which are per-namespace > operations): let's first understand the real root-cause before we start redesigning data-structures. --Sowmini