On 3/6/2014 7:54 AM, perdurabo wrote: > Toby Lazar wrote >> Unless Solr is your system of record, aren't you already replicating your >> source data across the WAN? If so, could you load Solr in colo B from >> your colo B data source? You may be duplicating some indexing work, but >> at least your colo B Solr would be more closely in sync with your colo B >> data. > > Our system of record exists in a SQL DB that is indeed replicated via > always-on mirroring to the failover data center. However, a complete forced > re-index of all of the data could take hours and our SLA requires us to be > back up with searchable indices in minutes. Because we may have to > replicate multiple data centers' data (three plus data centers, A, B and the > failover DC) into this failover data center, we can't dedicate the failover > data center's SolrCloud to constantly re-index data from a single SQL mirror > when we could potentially need it to take over for any given one.
There are a lot of issues with availability and multiple data centers that must be addressed before SolrCloud can handle this all internally. Until that day comes, here's what I would do: Have a SolrCloud install at each online data center, just as you already do. It should have collection names that are unique to the functions of that DC, and may include the DC name. If you MUST have the same collection name in all online data centers despite there being different data, you can use collection aliasing. The actual collection name would be something like stuff_dca, but you'd have an alias called stuff that can be used for both indexing and querying. You would need to index the data for all data centers to the SolrCloud install at the failover DC. Ideally that would be done from the failover DC's SQL, not over the WAN ... but it really wouldn't matter. Because each production DC collection will have a unique name, all collections can coexist on the failover SolrCloud. If a failover becomes necessary, you can make or change collection any required aliases on the fly. Although I don't use SolrCloud, and I don't have multiple data centers, my own index uses a similar paradigm. I have two completely independent copies of my index. My indexing program knows about them both and indexes them independently. There is another benefit to this: I can make changes (Solr upgrades, new config/schema, a complete rebuild, etc.) to one copy of my index without affecting the search application at all. By simply enabling or disabling the ping handler in Solr, my load balancer will keep requests going to whichever copy I choose. Thanks, Shawn