Hi guys,

Well I have been reading net code seriously for two days, so I am still 
basically a complete network klutz.  But we have a nasty network-realted vm 
deadlock that needs fixing and there seems to be little choice but to wade in 
and try to sort things out.

Here is the basic problem:

   http://lwn.net/Articles/129703/

There was discussion of this at the Kernel Summit:

   http://lwn.net/Articles/144273/

I won't discuss this further except to note that people are shooting way wide 
of the mark by talking about throttling user processes.  It is the block IO 
paths that need throttling, nothing more and nothing less.  When a block IO 
path extends over the network, then the network protocol involved needs 
throttling.  More specifically, the memory usage of the network part of the 
path needs to be bounded, network memory needs to be drawn from a reserve 
corresponding to the known bound, and we need to ensure that the number of 
requests in flight is bounded in order to know how big the memory reserve 
needs to be.

A couple of details interact to make this hard:

  1) There may be other traffic on a network interface that just block IO
     protocol.  We need to ensure the block IO traffic gets through, perhaps
     at the expense of dropping other traffic at critical times.

  2) Memory is allocated for packet buffers in the net interface drivers,
     before we have decoded any protocol headers, and thus, before we even
     know if a particular packet is involved in network block IO.

At OLS, I heard of an interesting proposal to attack this problem, apparently 
put forth at the networking summit shortly before.  The idea is to support 
multiple MAC addresses per interface, using ARP proxy techniques.  The MAC 
address in each ethernet packet can then be used to denote a particular kind 
of traffic on an interface, i.e., block IO traffic.  This is only part of a 
solution of course, we would still have to do form of throttling even if we 
did not have to worry about unrelated traffic. This technique does seem 
workable to me, but I would prefer a more local solution if one is to be 
found.  I think I have found one, but I need a reality check on my reasoning, 
which is the purpose of this post.

Here is the plan:

  * All protocols used on an interface that supports block IO must be
    vm-aware.

If we wish, we can leave it up to the administrator to ensure that only 
vm-aware protocols are used on an interface that supports block IO, or we can 
do some automatic checking.

  * Any socket to be used for block IO will be marked as a "vmhelper".

The number of protocols that need to have this special knowledge is quite 
small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others.  We are talking 
about a line or two of code in each to add the necessary awareness.

  * Inside the network driver, when memory is low we will allocate space
    for every incoming packet from a memory reserve, regardless of whether
    it is related to block IO or not.

  * Under low memory, we call the protocol layer synchronously instead of
    queuing the packet through softnet.

We do not necessarily have to bypass softnet, since there is a mechanism for 
thottling packets at this point.  However, there is a big problem with 
throttling here: we haven't classified the packet yet, so the throttling 
might discard some block IO packets, which is exactly what we don't want to 
do under memory pressure.

  * The protocol receive handler does the socket lookup, then if memory is
    low, discards any packet not belonging to a vmhelper socket.

Roughly speaking, the driver allocates each skb via:

        skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb();

Then the driver hands off the packet to netif_rx, which does:

        if (from_reserve(skb)) {
                netif_receive_skb(skb);
                return;
        }

And in the protocol handler we have:

        if (memory_pressure && !is_vmhelper(sock) && from_reserve(skb))
                goto drop_the_packet;

That is pretty much it.  Now, being a net newbie, it is not entirely clear to 
me that we can call netif_receive_skb directly when packets are also being 
queued through the softnet interface.  May I have some guidance on this 
point, please?

If that works, I am prepared to justify and prove the rest.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to