One question is why the RTO gets so large that it limits failover?
If Linux TCP is working correctly, RTO should be srtt + 2*rttvar
So either there is a huge srtt or variance, or something is going
wrong with RTT estimation. Given some reasonable maximums of
Srtt = 500ms and rttvar = 250ms, that would cause RTO to be 1second.
I suspect that what is happening here is that a link goes down in a
trunk somewhere for some number of seconds, resulting in a given TCP
segment being retransmitted several times, with the doubling of the RTO
each time.
rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html