I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and rebalance data accordingly.

Lets add the following constraints:
  - 1. boxes have different characteristics (RAM, CPU, disks)
- 2. different number of shards per box/node (lets pretend we have found the sweet spot for each box) - 3. once rebalancing is over, the layout of the cluster should be the same as if it had been bootstrapped from N+M boxes

Because of the above constraints, shard splitting or moving shards around is not an option. And too keep the discussion simple, lets ignore shard replicas.

So far, the best scenario I could think of is the following:
  - a. 1 collection on the N nodes using implicit routing
  - b. add shards on the M new nodes as part of that collection
- c. reindex a portion of the data on the shards of the M new nodes, while restricting them from search - d. in 1 transaction, delete the old data and immediately issue a soft commit and remove search restrictions

Any better idea?

I could also use 1 collection per box and have Solr do the routing within each collection. I would still have to handle the routing across collections but collection aliases would come in handy. But overall, it would be similar to the above scenario. Actually in my case, it wouldn't work as well because I also use some kind of "flag document" on the M new nodes which I need to update atomically with the delete of the old stuff. And, if I'm not mistaken, I'd loose atomicity with the multi-collection scenario.

Thank you for your feedback,
Damien





Reply via email to