From: Jon Maloy <jon.ma...@ericsson.com> Date: Mon, 13 Jun 2016 20:46:22 -0400
> TIPC based clusters are by default set up with full-mesh link > connectivity between all nodes. Those links are expected to provide > a short failure detection time, by default set to 1500 ms. Because > of this, the background load for neighbor monitoring in an N-node > cluster increases with a factor N on each node, while the overall > monitoring traffic through the network infrastructure increases at > a ~(N * (N - 1)) rate. Experience has shown that such clusters don't > scale well beyond ~100 nodes unless we significantly increase failure > discovery tolerance. > > This commit introduces a framework and an algorithm that drastically > reduces this background load, while basically maintaining the original > failure detection times across the whole cluster. Using this algorithm, > background load will now grow at a rate of ~(2 * sqrt(N)) per node, and > at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will > now have to actively monitor 38 neighbors in a 400-node cluster, instead > of as before 399. > > This "Overlapping Ring Supervision Algorithm" is completely distributed > and employs no centralized or coordinated state. It goes as follows: ... > Acked-by: Ying Xue <ying....@windriver.com> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> Applied, thanks.