Re: Transparently rebalancing a Solr cluster without splitting or moving shards

Damien Dykman Mon, 07 Jul 2014 15:43:19 -0700

Thanks Shawn, clean way to do it, indeed. And going your route, onecould even copy the existing shards into the new collection and thendelete the data which is getting reindexed on the new nodes. That wouldspare reindexing everything.

But in my case, I add boxes after a noticeable performance degradationdue to data volume increase. So the old boxes cannot afford reindexingdata (or deleting if using the propose variation) in the new collectionwhile serving searches with the old collection. Unless there is a way tobound aggressively the RAM consumption of new collection (disablingMMAP?), given that it's not being used for search during the transition?That said, even if that was possible, both collections would compete fordisk IOs.


Thanks,
Damien

On 07/07/2014 12:26 PM, Shawn Heisey wrote:

On 7/7/2014 12:41 PM, Damien Dykman wrote:

I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes
and rebalance data accordingly.

Lets add the following constraints:
   - 1. boxes have different characteristics (RAM, CPU, disks)
   - 2. different number of shards per box/node (lets pretend we have
found the sweet spot for each box)
   - 3. once rebalancing is over, the layout of the cluster should be
the same as if it had been bootstrapped from N+M boxes

Because of the above constraints, shard splitting or moving shards
around is not an option. And too keep the discussion simple, lets
ignore shard replicas.

So far, the best scenario I could think of is the following:
   - a. 1 collection on the N nodes using implicit routing
   - b. add shards on the M new nodes as part of that collection
   - c. reindex a portion of the data on the shards of the M new nodes,
while restricting them from search
   - d. in 1 transaction, delete the old data and immediately issue a
soft commit and remove search restrictions

You may not like this answer, but here's a fairly clean way to do this,
assuming you have enough disk space on the existing machines:

1. Add the new boxes to the cluster.
2. Create a new collection across all the boxes.
2a. If your current collection is named "test" then name the new one
     "test0" or something else that's related, but different.
3. Index all data into the new collection.
4. As quickly as possible, do the following actions:
4a. Stop indexing.
4b. Do a synchronization pass on the new collection so it's current.
4c. Delete the original collection.
4d. Create a collection alias so that you can access the new collection
     with the original collection name.
4e. Restart indexing.


Thanks,
Shawn

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

Reply via email to