Re: How to restore an index from a backup over HTTP

Greg Solovyev Thu, 04 Sep 2014 15:34:56 -0700

Thanks Jeff!

Thanks,
Greg


----- Original Message -----
From: "Jeff Wartes" <jwar...@whitepages.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, August 20, 2014 10:36:07 AM
Subject: Re: How to restore an index from a backup over HTTP

Here’s the repo:
https://github.com/whitepages/solrcloud_manager


Comments/Issues/Patches welcome.


On 8/18/14, 11:28 AM, "Greg Solovyev" <g...@zimbra.com> wrote:

>Thanks Jeff, I'd be interested in taking a look at the code for this
>tool. My github ID is grishick.
>
>Thanks,
>Greg
>
>----- Original Message -----
>From: "Jeff Wartes" <jwar...@whitepages.com>
>To: solr-user@lucene.apache.org
>Sent: Monday, August 18, 2014 9:49:28 PM
>Subject: Re: How to restore an index from a backup over HTTP
>
>I¹m able to do cross-solrcloud-cluster index copy using nothing more than
>careful use of the ³fetchindex² replication handler command.
>
>I¹m using this as a build/deployment tool, so I manually create a
>collection in two clusters, index into one, test, and then ask the other
>cluster to fetchindex from it on each shard/replica.
>
>Some caveats:
>  1. It seems like fetchindex may silently decline if it thinks the index
>it has is newer.
>  2. I¹m not doing this on an index that¹s currently receiving updates.
>  3. SolrCloud replication doesn¹t come into this flow, even if you
>fetchindex on a leader. (although once you¹re done, updates should get
>replicated normally)
>  4. Both collections must be created with the same number of shards and
>sharding mechanism. (although replication factor can vary)
> 
>
>I¹ve got a tool for automating this that I¹d like to push to github at
>some point, let me know if you¹re interested.
>
>
>
>
>
>On 8/16/14, 3:03 AM, "Greg Solovyev" <g...@zimbra.com> wrote:
>
>>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
>>straight forward, but the main concern I have is the internal data format
>>that ReplicationHandler and SnapPuller use. This new handler as well as
>>the code that I've already written to download the index files from Solr
>>will depend on that format. Unfortunately, this format is not documented
>>and is not abstracted by SolrJ, so I wonder what I can do to make sure it
>>does not change on us without notice.
>>
>>Thanks,
>>Greg
>>
>>----- Original Message -----
>>From: "Shawn Heisey" <s...@elyograg.org>
>>To: solr-user@lucene.apache.org
>>Sent: Friday, August 15, 2014 7:31:19 PM
>>Subject: Re: How to restore an index from a backup over HTTP
>>
>>On 8/15/2014 5:51 AM, Greg Solovyev wrote:
>>> What I want to achieve is being able to send the backed up index to
>>>Solr (either standalone or with ZooKeeper) in a way similar to creating
>>>a new Collection. I.e. create a new collection and upload an exiting
>>>index directly into that Collection. I've looked through Solr code and
>>>so far I have not found a handler that would allow this scenario. So,
>>>the last idea is to implement a special handler for this case, perhaps
>>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller
>>>do pretty much what I need to do, except that the action has to be
>>>initiated by the receiving Solr server and I need to initiate the action
>>>externally. I.e., instead of having Solr slave download an index from
>>>Solr master, I need to feed the index to Solr master and ideally this
>>>would work the same way in standalone and SolrCloud modes.
>>
>>I have not made any attempt to verify what I'm stating below.  It may
>>not work.
>>
>>What I think I would *try* is setting up a standalone Solr (no cloud) on
>>the backup server.  Use scripted index/config copies and Solr start/stop
>>actions to get the index up and running on a known core in the
>>standalone Solr.  Then use the replication handler's HTTP API to
>>replicate the index from that standalone server to each of the replicas
>>in your cluster.
>>
>>https://wiki.apache.org/solr/SolrReplication#HTTP_API
>>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexR
>>e
>>plication-HTTPAPICommandsfortheReplicationHandler
>>
>>One thing that I do not know is whether SolrCloud itself might interfere
>>with these actions, or whether it might automatically take care of
>>additional replicas if you replicate to the shard leader.  If SolrCloud
>>*would* interfere, then this idea might need special support in
>>SolrCloud, perhaps as an extension to the Collections API.  If it won't
>>interfere, then the use-case would need to be documented (on the user
>>wiki at a minimum) so that committers will be aware of it and preserve
>>the capability in future versions.  An extension to the Collections API
>>might be a good idea either way -- I've seen a number of questions about
>>capability that falls under this basic heading.
>>
>>Thanks,
>>Shawn

Re: How to restore an index from a backup over HTTP

Reply via email to