Hi guys, Well I have been reading net code seriously for two days, so I am still basically a complete network klutz. But we have a nasty network-realted vm deadlock that needs fixing and there seems to be little choice but to wade in and try to sort things out.
Here is the basic problem: http://lwn.net/Articles/129703/ There was discussion of this at the Kernel Summit: http://lwn.net/Articles/144273/ I won't discuss this further except to note that people are shooting way wide of the mark by talking about throttling user processes. It is the block IO paths that need throttling, nothing more and nothing less. When a block IO path extends over the network, then the network protocol involved needs throttling. More specifically, the memory usage of the network part of the path needs to be bounded, network memory needs to be drawn from a reserve corresponding to the known bound, and we need to ensure that the number of requests in flight is bounded in order to know how big the memory reserve needs to be. A couple of details interact to make this hard: 1) There may be other traffic on a network interface that just block IO protocol. We need to ensure the block IO traffic gets through, perhaps at the expense of dropping other traffic at critical times. 2) Memory is allocated for packet buffers in the net interface drivers, before we have decoded any protocol headers, and thus, before we even know if a particular packet is involved in network block IO. At OLS, I heard of an interesting proposal to attack this problem, apparently put forth at the networking summit shortly before. The idea is to support multiple MAC addresses per interface, using ARP proxy techniques. The MAC address in each ethernet packet can then be used to denote a particular kind of traffic on an interface, i.e., block IO traffic. This is only part of a solution of course, we would still have to do form of throttling even if we did not have to worry about unrelated traffic. This technique does seem workable to me, but I would prefer a more local solution if one is to be found. I think I have found one, but I need a reality check on my reasoning, which is the purpose of this post. Here is the plan: * All protocols used on an interface that supports block IO must be vm-aware. If we wish, we can leave it up to the administrator to ensure that only vm-aware protocols are used on an interface that supports block IO, or we can do some automatic checking. * Any socket to be used for block IO will be marked as a "vmhelper". The number of protocols that need to have this special knowledge is quite small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others. We are talking about a line or two of code in each to add the necessary awareness. * Inside the network driver, when memory is low we will allocate space for every incoming packet from a memory reserve, regardless of whether it is related to block IO or not. * Under low memory, we call the protocol layer synchronously instead of queuing the packet through softnet. We do not necessarily have to bypass softnet, since there is a mechanism for thottling packets at this point. However, there is a big problem with throttling here: we haven't classified the packet yet, so the throttling might discard some block IO packets, which is exactly what we don't want to do under memory pressure. * The protocol receive handler does the socket lookup, then if memory is low, discards any packet not belonging to a vmhelper socket. Roughly speaking, the driver allocates each skb via: skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb(); Then the driver hands off the packet to netif_rx, which does: if (from_reserve(skb)) { netif_receive_skb(skb); return; } And in the protocol handler we have: if (memory_pressure && !is_vmhelper(sock) && from_reserve(skb)) goto drop_the_packet; That is pretty much it. Now, being a net newbie, it is not entirely clear to me that we can call netif_receive_skb directly when packets are also being queued through the softnet interface. May I have some guidance on this point, please? If that works, I am prepared to justify and prove the rest. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html