On 9/9/2014 9:20 AM, Salman Akram wrote: > So realistically speaking you cannot have SolrCloud work for 2 data centers > as a redundant solution because no matter how many nodes you add you still > would need at least 1 node in the 2nd center working too.
Precisely. > So that just leaves with non-SolrCloud solutions. > > "1) Change the replication config to redefine the master and reload the core > or restart Solr." > > That of course is a simple way but the real issue is about the possible > issues and some good practices e.g. normally the scenario would be that > primary data center goes down for few hours and till then we upgrade one of > the slaves in secondary to a master. Now > > - IF there is no lag there won't be any issue in secondary at least but > what if there is lag and one of the files is not completely replicated? > That file would be discarded or there is a possibility that whole index is > not usable? > > - Once the primary comes back how would we now copy the delta from > secondary? Make it a slave of secondary first, replicate the delta and then > set it as a master again? If you're handling all your replication yourself with the HTTP API, then you would contact the old master when it comes back up and ask it to replicate from the temporary master. Then you switch modes in your program that drives the replication and have it use the original master for all replication. If you need to switch masters for non-cloud setups, it's really not practical to have Solr be in control of the replication, because you have to modify the config in place and kick Solr to make it re-read the config. It's extremely messy and prone to error. As for an incomplete replication ... I do not know this for sure, but i would imagine that if a replication is not complete, it won't switch indexes, it will keep going with the one it's already got. > In other words is there a good guide out there for this with possible > issues and solutions? Definitely before SolrCloud people would be doing > this and even now SolrCloud doesn't seem practical in quite a few > situations. SolrCloud relies on zookeeper to maintain the cluster. It knows how to deal with the *Solr* parts of a distributed cluster, but it leaves the management of the cluster itself to zookeeper -- they've been doing it a lot longer than we have, so we can use a wheel that's already invented instead of building it ourselves. Because Zookeeper prizes a guaranteed quorum above all else, its design is not well-suited for a two-datacenter solution. SolrCloud works really well within a single data center, or with three. Thanks, Shawn