From: Tuong Lien <tuong.t.l...@dektech.com.au>
Date: Mon, 11 Feb 2019 13:29:43 +0700

> When a link endpoint is re-created (e.g. after a node reboot or
> interface reset), the link session number is varied by random, the peer
> endpoint will be synced with this new session number before the link is
> re-established.
> 
> However, there is a shortcoming in this mechanism that can lead to the
> link never re-established or faced with a failure then. It happens when
> the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
> well as the 'in_session' flag have been set, but suddenly this link
> endpoint leaves. When it comes back with a random session number, there
> are two situations possible:
> 
> 1/ If the random session number is larger than (or equal to) the
> previous one, the peer endpoint will be updated with this new session
> upon receipt of a RESET_MSG from this endpoint, and the link can be re-
> established as normal. Otherwise, all the RESET_MSGs from this endpoint
> will be rejected by the peer. In turn, when this link endpoint receives
> one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
> to send STATE_MSGs, but again these messages will be dropped by the
> peer due to wrong session.
> The peer link endpoint can still become ESTABLISHED after receiving a
> traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
> NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
> will be forced down sooner or later!
> 
> Even in case the random session number is larger than the previous one,
> it can be that the ACTIVATE_MSG from the peer arrives first, and this
> link endpoint moves quickly to ESTABLISHED without sending out any
> RESET_MSG yet. Consequently, the peer link will not be updated with the
> new session number, and the same link failure scenario as above will
> happen.
> 
> 2/ Another situation can be that, the peer link endpoint was reset due
> to any reasons in the meantime, its link state was set to RESET from
> ESTABLISHING but still in session, i.e. the 'in_session' flag is not
> reset...
> Now, if the random session number from this endpoint is less than the
> previous one, all the RESET_MSGs from this endpoint will be rejected by
> the peer. In the other direction, when this link endpoint receives a
> RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
> ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
> As a result, the link cannot be re-established but gets stuck with this
> link endpoint in state ESTABLISHING and the peer in RESET!
> 
> Solution:
> ===========
> 
> This link endpoint should not go directly to ESTABLISHED when getting
> ACTIVATE_MSG from the peer which may belong to the old session if the
> link was re-created. To ensure the session to be correct before the
> link is re-established, the peer endpoint in ESTABLISHING state will
> send back the last session number in ACTIVATE_MSG for a verification at
> this endpoint. Then, if needed, a new and more appropriate session
> number will be regenerated to force a re-synch first.
> 
> In addition, when a link in ESTABLISHING state is reset, its state will
> move to RESET according to the link FSM, along with resetting the
> 'in_session' flag (and the other data) as a normal link reset, it will
> also be deleted if requested.
> 
> The solution is backward compatible.
> 
> Acked-by: Jon Maloy <jon.ma...@ericsson.com>
> Acked-by: Ying Xue <ying....@windriver.com>
> Signed-off-by: Tuong Lien <tuong.t.l...@dektech.com.au>

Applied.

Reply via email to