Re: Clarification on intended bootstrapping semantics

Jonathan Ellis Wed, 02 Feb 2011 08:48:29 -0800

On Thu, Jan 20, 2011 at 5:05 AM, Peter Schuller
<peter.schul...@infidyne.com> wrote:
> (1) Starting from scratch with N nodes, M of which are seed nodes (M
> <= N), no initial token and autobootstrap false, starting them all up
> with clean data directories should be supported without concern for
> start-up order.  The cluster is expected to converge. As long as no
> writes are happening, and there is no data in the cluster, there is no
> problem. There will be no "divergent histories" that prevent gossip
> from working.


Right.

> (2) Specifically, given that one wants for example 2 seeds, there is
> no requirement to join the "second" seed as a non-seed *first*, only
> to then restart with it as seed after having joined the cluster.

Right.

> (3) The critical invariant for the operator to maintain with respect
> to seed nodes, is that no node is ever listed as a seed node in other
> node's configuration, without said seed node first having joined the
> cluster.

It's more forgiving than that.  The real critical invariant is that
nodes should not have disjoint sets of seed nodes.  (Which we commonly
interpret as "keep all the seed lists the same.")

> (4) It is always fine for a seed node to consider itself a seed even
> during initial start-up and joining the ring.

Yes.

> (5) enabling auto_bootstrap does not just affect the method by which
> tokens are selected, but also affects *whether* the bootstrap process
> includes streaming data from other nodes prior to becoming up in the
> ring (i.e., whether StorageService.bootstrap() is going to be called
> in initServer())

Right, in fact, the second part is the primary effect since usually
you should specify initial_token when adding nodes.

> (6) having a node join a pre-existing cluster with data in it without
> auto_bootstrap set to true, would cause the ring to join the cluster
> but be void of data, thus potentially violating consistency guarantees
> (but recovery is possible by running repair)

Right.

> (7) A consequence of (5)+(6) is that auto_bootstrap should *always* be
> enabled on all nodes in a production cluster, except:
> (7a) New nodes being brought in as seeds

No, this will break things as in (6).  The right way to add new seeds
is to first add it as a non-seed, then update config files to add it
to seed list later.

> (7b) During the very first initial cluster setup with no data

Yes.

> (7) The above is intended and on purpose, and it would be correct to
> operate under these assumptions when updating/improving documentation.

Yes. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Clarification on intended bootstrapping semantics

Reply via email to