On 7/7/2014 12:41 PM, Damien Dykman wrote: > I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes > and rebalance data accordingly. > > Lets add the following constraints: > - 1. boxes have different characteristics (RAM, CPU, disks) > - 2. different number of shards per box/node (lets pretend we have > found the sweet spot for each box) > - 3. once rebalancing is over, the layout of the cluster should be > the same as if it had been bootstrapped from N+M boxes > > Because of the above constraints, shard splitting or moving shards > around is not an option. And too keep the discussion simple, lets > ignore shard replicas. > > So far, the best scenario I could think of is the following: > - a. 1 collection on the N nodes using implicit routing > - b. add shards on the M new nodes as part of that collection > - c. reindex a portion of the data on the shards of the M new nodes, > while restricting them from search > - d. in 1 transaction, delete the old data and immediately issue a > soft commit and remove search restrictions
You may not like this answer, but here's a fairly clean way to do this, assuming you have enough disk space on the existing machines: 1. Add the new boxes to the cluster. 2. Create a new collection across all the boxes. 2a. If your current collection is named "test" then name the new one "test0" or something else that's related, but different. 3. Index all data into the new collection. 4. As quickly as possible, do the following actions: 4a. Stop indexing. 4b. Do a synchronization pass on the new collection so it's current. 4c. Delete the original collection. 4d. Create a collection alias so that you can access the new collection with the original collection name. 4e. Restart indexing. Thanks, Shawn