Re: How to restore an index from a backup over HTTP

Greg Solovyev Sat, 16 Aug 2014 03:04:43 -0700

Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty 
straight forward, but the main concern I have is the internal data format that 
ReplicationHandler and SnapPuller use. This new handler as well as the code 
that I've already written to download the index files from Solr will depend on 
that format. Unfortunately, this format is not documented and is not abstracted 
by SolrJ, so I wonder what I can do to make sure it does not change on us 
without notice.

Thanks,
Greg

----- Original Message -----
From: "Shawn Heisey" <s...@elyograg.org>
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
> What I want to achieve is being able to send the backed up index to Solr 
> (either standalone or with ZooKeeper) in a way similar to creating a new 
> Collection. I.e. create a new collection and upload an exiting index directly 
> into that Collection. I've looked through Solr code and so far I have not 
> found a handler that would allow this scenario. So, the last idea is to 
> implement a special handler for this case, perhaps extending 
> CoreAdminHandler. ReplicationHandler together with SnapPuller do pretty much 
> what I need to do, except that the action has to be initiated by the 
> receiving Solr server and I need to initiate the action externally. I.e., 
> instead of having Solr slave download an index from Solr master, I need to 
> feed the index to Solr master and ideally this would work the same way in 
> standalone and SolrCloud modes. 

I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn

Re: How to restore an index from a backup over HTTP

Reply via email to