From: Jon Maloy <jon.ma...@ericsson.com>
Date: Mon, 13 Jun 2016 20:46:22 -0400

> TIPC based clusters are by default set up with full-mesh link
> connectivity between all nodes. Those links are expected to provide
> a short failure detection time, by default set to 1500 ms. Because
> of this, the background load for neighbor monitoring in an N-node
> cluster increases with a factor N on each node, while the overall
> monitoring traffic through the network infrastructure increases at
> a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
> scale well beyond ~100 nodes unless we significantly increase failure
> discovery tolerance.
> 
> This commit introduces a framework and an algorithm that drastically
> reduces this background load, while basically maintaining the original
> failure detection times across the whole cluster. Using this algorithm,
> background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
> at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
> now have to actively monitor 38 neighbors in a 400-node cluster, instead
> of as before 399.
> 
> This "Overlapping Ring Supervision Algorithm" is completely distributed
> and employs no centralized or coordinated state. It goes as follows:
 ...
> Acked-by: Ying Xue <ying....@windriver.com>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>

Applied, thanks.

Reply via email to