Transparently rebalancing a Solr cluster without splitting or moving shards

Damien Dykman Mon, 07 Jul 2014 11:42:12 -0700

I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes andrebalance data accordingly.


Lets add the following constraints:
  - 1. boxes have different characteristics (RAM, CPU, disks)

- 2. different number of shards per box/node (lets pretend we havefound the sweet spot for each box)- 3. once rebalancing is over, the layout of the cluster should bethe same as if it had been bootstrapped from N+M boxes

Because of the above constraints, shard splitting or moving shardsaround is not an option. And too keep the discussion simple, lets ignoreshard replicas.


So far, the best scenario I could think of is the following:
  - a. 1 collection on the N nodes using implicit routing
  - b. add shards on the M new nodes as part of that collection

- c. reindex a portion of the data on the shards of the M new nodes,while restricting them from search- d. in 1 transaction, delete the old data and immediately issue asoft commit and remove search restrictions


Any better idea?

I could also use 1 collection per box and have Solr do the routingwithin each collection. I would still have to handle the routing acrosscollections but collection aliases would come in handy. But overall, itwould be similar to the above scenario. Actually in my case, it wouldn'twork as well because I also use some kind of "flag document" on the Mnew nodes which I need to update atomically with the delete of the oldstuff. And, if I'm not mistaken, I'd loose atomicity with themulti-collection scenario.


Thank you for your feedback,
Damien

Transparently rebalancing a Solr cluster without splitting or moving shards

Reply via email to