Re: Correct approach to copy index between solr clouds?

Erick Erickson Sat, 26 Aug 2017 09:26:41 -0700

Approach 2 is sufficient. You do have to insure that you don't copy
over the write.lock file however as you may not be able to start
replicas if that's there.

There's a relatively little-known third option. You an (ab)use the
replication API "fetchindex" command, see:
https://cwiki.apache.org/confluence/display/solr/Index+Replication to
pull the index from Cloud B to replicas on Cloud A. That has the
advantage of working even if you are actively indexing to Cloud B.
NOTE: currently you cannot _query_ CloudA (the target) while the
fetchindex is going on, but I doubt you really care since you were
talking about having Cloud A offline anyway. So for each replica you
fetch to you'll send the fetchindex command directly to the replica on
Cloud A and the "masterURL" will be the corresponding replica on Cloud
B.

Finally, what I'd really do is _only_ have one replica for each shard
on Cloud A active and fetch to _that_ replica. I'd also delete the
data dir on all the other replicas for the shard on Cloud A. Then as
you bring the additional replicas up they'll do a full synch from the
leader.

FWIW,
Erick

On Fri, Aug 25, 2017 at 6:53 PM, Wei <weiwan...@gmail.com> wrote:
> Hi,
>
> In our set up there are two solr clouds:
>
> Cloud A:  production cloud serves both writes and reads
>
> Cloud B:  back up cloud serves only writes
>
> Cloud A and B have the same shard configuration.
>
> Write requests are sent to both cloud A and B. In certain circumstances
> when Cloud A's update lags behind,  we want to bulk copy the binary index
> from B to A.
>
> We have tried two approaches:
>
> Approach 1.
>       For cloud A:
>       a. delete collection to wipe out everything
>       b. create new collection (data is empty now)
>       c. shut down solr server
>       d. copy binary index from cloud B to corresponding shard replicas in
> cloud A
>       e. start solr server
>
> Approach 2.
>       For cloud A:
>       a.  shut down solr server
>       b.  remove the whole 'data' folder under index/  in each replica
>       c.  copy binary index from cloud B to corresponding shard replicas in
> cloud A
>       d.  start solr server
>
> Is approach 2 sufficient?  I am wondering if delete/recreate collection
> each time is necessary to get cloud into a "clean" state for copy binary
> index between solr clouds.
>
> Thanks for your advice!

Re: Correct approach to copy index between solr clouds?

Reply via email to