On 10/26/11 22:53, John Baldwin wrote:
On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote:
On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote:
On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
My suggestion would be that if we won't be able to fix it before 9.0,
we should turn this assertion off, as the system seems to be able to
recover.
Shipped kernels have all assertions turned off.
Yes, I'm aware of that, but many people compile their production kernels
with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
it into a printf, so it will be visible.
No, the kernel is corrupting things in other places when this is true, so
if you are running with INVARIANTS, we want to know about it. Specifically,
in several places in TCP we assume that rcv_adv>= rcv_nxt, and depend on
being able to do 'rcv_adv - rcv_nxt'.
In this case, it looks like the difference is consistently less than one
frame. I suspect the other end of the connection is sending just beyond the
end of the advertised window (it probably assumes it is better to send a full
frame if it has that much pending data even though part of it is beyond the
window edge vs sending a truncated packet that just fills the window) and that
that frame is accepted ok in the header prediction case and it's ACK is
delayed, but the next packet to arrive then trips over this assumption.
Since 'win' is guaranteed to be non-negative and we explicitly cast
'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
for:
tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
I think we already handle this case ok and perhaps the assertion can just be
removed? Not sure if others feel that it warrants a comment to note that this
is the case being handled.
I added debug to the places where rcv_adv and rcv_nxt are modified. Here
is what happens before the panic occurs:
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022361548
rcv_adv 4022360100 diff -1448
tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022362298
rcv_adv 4022361548 diff -750
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022363746
rcv_adv 4022362298 diff -1448
tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022364836
rcv_adv 4022363746 diff -1090
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022366284
rcv_adv 4022364836 diff -1448
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022370628
rcv_adv 4022369690 diff -938
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022379140
rcv_adv 4022377692 diff -1448
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022387792
rcv_adv 4022386344 diff -1448
tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022388890
rcv_adv 4022387792 diff -1098
tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022390338
rcv_adv 4022388890 diff -1448
tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563
rcv_adv 4022394342 diff -221
panic: tcp_input negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563
rcv_adv 4022394342 win=0 diff -221
I can send you the full log if you want, I've plenty of messages where
rcv_adv< rcv_nxt, not all of them trigger this assertion.
The assertion would be triggered when the next packet arrives (as I said
above). Try modifying your debugging output to also log if the ACK is
delayed. I suspect it is not delayed until the last one. (Pushing out an
ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
of an immediate ACK, rcv_nxt> rcv_adv is only a transient condition all
under a single lock invocation so never visible to other consumers of the
protocol control block.) If that is what you see, then that confirms what
I guessed above and I will likely just remove the assertion in tcp_input()
and patch the timewait code to handle this case.
Pawel, have you been able to confirm John's hypothesis? What I don't
quite get is why we haven't had a lot more reports of this issue...
Cheers,
Lawrence
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"