Hello. On 04/05/2016 07:20 PM, Jon Maloy wrote:
When a link is down, it will continuously try to re-establish contact with the peer by sending out a RESET or and ACTIVATE message at each
And/or?
timeout interval. The default value for this interval is currently 375 ms. This is wasteful, and may become a problem in very large clusters with dozens or hundereds of nodes being down simultaneously.
Hundreds.
We now introduce a simple backoff algorithm for these cases. The first five messages are sent at default rate; thereafter a message is sent only each 16't timer interval.
16th?
This will cover the vast majority of link recyling cases, since the
Recycling.
endpoint starting last will transmit at the higher speed, and the link should normally be established well be before the rate needs to be reduced. The only case where we will see a degradation of link re-establishment is when the endpoins remain intact, and a glitch in the transmission
Endpoints.
media is causing the link reset. We will then experience a worst-case re-establishing time of 6 seconds, something we deem acceptable. Acked-by: Ying Xue <ying....@windriver.com> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
[...] MBR, Sergei