On 2/28/2013 4:20 PM, varun srivastava wrote:
We have 10 virtual data centres . Now its setup like this because we do
rolling update. While 1 st dc is getting indexed other 9 serve traffic .
Indexing one dc take 2 hours. Now with single shard we use to index one dc
and then quickly replicate index into other dcs by having master-slave
setup. Now in case of solr cloud obviously we can't index each dc
sequentially as it will take 2*10 hours. So we need way of indexing 1 dc
and then somehow quickly propagate the index binary to others. What will
you recommend for solr cloud ?

This is my understanding of how SolrCloud works. If I am wrong about any of this, I'm sure one of the experts will correct me. I'm still learning SolrCloud, so this is an opportunity for me to find out if I understand it right:

SolrCloud is not master-slave. One replica of each shard is designated leader. I think you can influence which one becomes leader, but I don't know how to do this.

When you index, the receiving node forwards the request to the leader of the correct shard. The leader then processes the update request locally and sends it to all replicas of that shard, so they all index the same data independently.

If a node goes down, the remaining replicas handle requests and continue to process any updates that come in. When the down node comes back up, the leader will see if it can use its transaction log to sync up the recovered node. If it can, it will do so. If it can't, it tells the recovered node to replicate its index, so you must have the replication handler enabled on all SolrCloud nodes, even though it does not use traditional master/slave roles.

If the leader goes down, the remaining replicas elect a new leader.

If you want to continue using master/slave semantics, I don't think you can use SolrCloud. SolrCloud will result in a lot of inter-DC traffic at all times, which you probably want to avoid.

Thanks,
Shawn

Reply via email to