Re: Replicating Between Solr Clouds

perdurabo Thu, 06 Mar 2014 06:55:29 -0800

Toby Lazar wrote
> Unless Solr is your system of record, aren't you already replicating your
> source data across the WAN?  If so, could you load Solr in colo B from
> your colo B data source?  You may be duplicating some indexing work, but
> at least your colo B Solr would be more closely in sync with your colo B
> data.


Our system of record exists in a SQL DB that is indeed replicated via
always-on mirroring to the failover data center.  However, a complete forced
re-index of all of the data could take hours and our SLA requires us to be
back up with searchable indices in minutes.  Because we may have to
replicate multiple data centers' data (three plus data centers, A, B and the
failover DC) into this failover data center, we can't dedicate the failover
data center's SolrCloud to constantly re-index data from a single SQL mirror
when we could potentially need it to take over for any given one. 

One thought we had was to have a situation where the DCs A and B would run a
cron job that would force a backup of the indices using the
"replication?command=backup" API command and then we would sync up those
backup snapshots to the failover DC's shut down SorCloud instance to a
separate filesystem directory dedicated to DC A's or DC B's indices.  Then
in the case of a failover we would have to run a script that would symlink
the snapshots for the particular DC we want to failover for to the index dir
for the failover DCs SolrCloud and then start up the nodes.  The problem
comes with how to handle different indices on different nodes in the
SolrCloud then we have 2 shards.  We would have to do a 1:1 copy of each of
the four nodes in DCs A and B to each of the other node in the failover DC. 
Sounds pretty ugly.

Looking at this thread, even this paln may not work:
http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-td4088447.html

As far as the SolrEntityProcessor, I'm not sure how you would configure it. 
>From what I gather, you have to configure a new requestHandler section in
your Solrconfig.xml like this:

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">/data/solr/mysolr/conf/data-config.xml</str>
    </lst>
</requestHandler>

And then you have to configure a "/data/solr/mysolr/conf/data-config.xml"
with the following contents:

<dataConfig>
  <document>
    <entity name="sep" processor="SolrEntityProcessor"
url="http://solrsource.example.com:8983/solr/"; query="*:*"/>
  </document>
</dataConfig>

However, this doesn't seem to work for me as I'm using a SolrCloud with
zookeeper.  I created these files in my conf directory and uploaded them to
zookeeper, then reloaded the collection/cores but all I got were
initialization errors.  I don't think the docs assume you'll be doing this
under a SolrCloud scenario.

Any other insight?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196p4121685.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replicating Between Solr Clouds

Reply via email to