Re: Replicating Between Solr Clouds

Shawn Heisey Thu, 06 Mar 2014 08:34:49 -0800

On 3/6/2014 7:54 AM, perdurabo wrote:
> Toby Lazar wrote
>> Unless Solr is your system of record, aren't you already replicating your
>> source data across the WAN?  If so, could you load Solr in colo B from
>> your colo B data source?  You may be duplicating some indexing work, but
>> at least your colo B Solr would be more closely in sync with your colo B
>> data.
> 
> Our system of record exists in a SQL DB that is indeed replicated via
> always-on mirroring to the failover data center.  However, a complete forced
> re-index of all of the data could take hours and our SLA requires us to be
> back up with searchable indices in minutes.  Because we may have to
> replicate multiple data centers' data (three plus data centers, A, B and the
> failover DC) into this failover data center, we can't dedicate the failover
> data center's SolrCloud to constantly re-index data from a single SQL mirror
> when we could potentially need it to take over for any given one.


There are a lot of issues with availability and multiple data centers
that must be addressed before SolrCloud can handle this all internally.

Until that day comes, here's what I would do:

Have a SolrCloud install at each online data center, just as you already
do.  It should have collection names that are unique to the functions of
that DC, and may include the DC name.  If you MUST have the same
collection name in all online data centers despite there being different
data, you can use collection aliasing.  The actual collection name would
be something like stuff_dca, but you'd have an alias called stuff that
can be used for both indexing and querying.

You would need to index the data for all data centers to the SolrCloud
install at the failover DC.  Ideally that would be done from the
failover DC's SQL, not over the WAN ... but it really wouldn't matter.
Because each production DC collection will have a unique name, all
collections can coexist on the failover SolrCloud.  If a failover
becomes necessary, you can make or change collection any required
aliases on the fly.

Although I don't use SolrCloud, and I don't have multiple data centers,
my own index uses a similar paradigm.  I have two completely independent
copies of my index.  My indexing program knows about them both and
indexes them independently.

There is another benefit to this: I can make changes (Solr upgrades, new
config/schema, a complete rebuild, etc.) to one copy of my index without
affecting the search application at all.  By simply enabling or
disabling the ping handler in Solr, my load balancer will keep requests
going to whichever copy I choose.

Thanks,
Shawn

Re: Replicating Between Solr Clouds

Reply via email to