Re: Seeds, autobootstrap nodes, and replication factor

Benjamin Black Fri, 04 Jun 2010 10:47:10 -0700

On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope <pstanh...@wimba.com> wrote:
>
> Here's the scenario: would like R = N where N is the number of nodes. Let's 
> say 8.
>
> 1. Create first node, modify storage-conf.xml and change the <Seed/> to be 
> the ip of the node. Change replication factor to 8 for CF of interest. Start 
> the puppy up.
>


RF is per Keyspace, not per CF.

> 2. Create 2nd node, modify storage-confg.xml and change <AutoBootstrap/> to 
> true and let it know the first seed. Ensure replication factor is 8 for the 
> CF of interest. Start the puppy up.
>

If you do it this way be aware token automatic assignment may not do
what you want.  It _probably_ will, since 8 is a power of 2, but be
aware.

> 3. Create 3rd node.
>
> Q1: Should the node1 and node2 be listed as seeds? Or only node1?
>

Doesn't matter.  Seeds are only used as a discovery mechanism.  One is
sufficient.

> 4. Create 4th node. Same question as before.
>
> Q2: Same question as before ... should the list of seeds grow as the cluster 
> grows? Alternative phrasing ... what is the relationship between Seed and 
> AutoBootstrap, i.e. can a Seed node in fact be a node that was 
> autobootstrapped? Is this considered best practice?
>

Once a node is bootstrapped, auto or otherwise, that's it.  It is now
just another node in the cluster.  How it got that way is not
relevant.

> At this point we've got 4 nodes in the cluster ... I've gotten this far with 
> no problems, loaded with tons of data and compared performance with various 
> replication factors. Seeing faster reads from any particular node (as 
> expected) when the ReplicationFactor is equal to the number of nodes in the 
> cluster. Have compared lots of single update/creates as well as batch_mutate 
> (which is very fast for bootstrapping the CFs -- highly recommended).
>
> And also seeing varying performance on reads (fast, and as expected) when 
> ReplicationFactor < N.
>
> Q3: What, if any issue, is there when R > N?
>

Not recommended.

> This is the situation as you're bringing up nodes in the cluster. And when 
> you take down a node (intentionally or as a failure).
>
> I know one consideration is that if R >= N ... and CF data grows ever bigger 
> ... there will be a hit as the node is created.
>
> Q4: If you know that you're never going to have more than 40 
> (MaxExpectedClusterNodes) in your cluster ... is it safe to set R >= 
> MaxExpectedClusterNodes?
>

Setting it higher is not going to help you.  It is also unclear to me
how having a cluster that large with an RF that high is going to
behave.  Read repair (which happens on every call) is going to be
_brutal_.

> Q5: If you set R = MaxExpectedClusterNodes ... and you end up servicing a 
> node .... and bringing up an alternate node in its place ... thus having R = 
> N at all times ... and then you bring up the N+1 node ... will it start to 
> receive the data that it missed while it was down?
>

This is the Hinted Handoff mechanism.


b

Re: Seeds, autobootstrap nodes, and replication factor

Reply via email to