From: Andrew Morton <[EMAIL PROTECTED]>
Date: Sun, 31 Jul 2005 15:12:51 -0700

> I've been trying to upgrade kernel from 2.6.12.3 to 2.6.13-rc4 on a 
> rather loaded http server, but i'm currently having a kernel panic a few 
> minutes only after booting. The bug was reproductible (the crash 
> happened after every reboot, with the same backtrace).

The two bug checks there are supposed to be impossible.
I wonder how this can trigger other than do some bizarre
memory corruption, but it's too precise a BUG() for it
to be really something like that.

The first check is tcp_skb_pcount() being not equal to one.
The caller of tcp_tso_should_defer() (where the BUG() is
triggering) looks like this:

static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
{
 ...
        tso_segs = tcp_init_tso_segs(sk, skb);
 ...
        while (likely(tcp_snd_wnd_test(tp, skb, mss_now))) {
                BUG_ON(!tso_segs);
 ...
                if (tso_segs == 1) {
 ...
                } else {
                        if (tcp_tso_should_defer(sk, tp, skb))
                                break;
                }
 ...
                skb = sk->sk_send_head;
                if (!skb)
                        break;
                tso_segs = tcp_init_tso_segs(sk, skb);
        }
 ...
}

So tso_segs is _always_ updated to be the tcp_skb_pcount(skb)
value, and due to the "if (tso_segs == 1)" test it can never
be "1" when we get to tcp_tso_should_defer().

That leaves the other branch of the assertion, namely:

     (tp->snd_cwnd <= in_flight)

First, tcp_tso_should_defer() checks for the special case
of the FIN bit being set, which causes us to return early
and not get to the assertion check, like so:

        if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)
                return 0;

that is the only exception to the "(tp->snd_cwnd <= in_flight)"
rule.

Next, the top level tcp_write_xmit() congestion window tracking
looks like:

static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
{
        ...
        cwnd_quota = tcp_cwnd_test(tp, skb);
        if (unlikely(!cwnd_quota))
                goto out;
        ...
        while (likely(tcp_snd_wnd_test(tp, skb, mss_now))) {
        ...
                if (unlikely(tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))))
                        break;
        ...
                update_send_head(sk, tp, skb);
        ...
                cwnd_quota -= tcp_skb_pcount(skb);

                BUG_ON(cwnd_quota < 0);
                if (!cwnd_quota)
                        break;
        }
        ...
}

1) cwnd_quota is initialized to the value:
        (tp->snd_cwnd - tcp_packets_in_flight(tp))
   at the top of tcp_write_xmit(), as long as this value
   is positive, else zero.

2) cwnd_quota is decremented by tcp_skb_pcount(skb) for every
   packet we send.

3) in parallel, tp->packets_out is incremented by tcp_skb_pcount(skb)
   as each packet goes out (via update_send_head())

Therefore, cwnd_quota must decrease exactly as much as
tcp_packets_in_flight(tp) increases.  This should therefore
keep everything in check.

There are no SMP issues as the socket is fully locked for this
entire code path.

In short, I'm stumped :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to