Thanks Jeff! Thanks, Greg
----- Original Message ----- From: "Jeff Wartes" <jwar...@whitepages.com> To: solr-user@lucene.apache.org Sent: Wednesday, August 20, 2014 10:36:07 AM Subject: Re: How to restore an index from a backup over HTTP Here’s the repo: https://github.com/whitepages/solrcloud_manager Comments/Issues/Patches welcome. On 8/18/14, 11:28 AM, "Greg Solovyev" <g...@zimbra.com> wrote: >Thanks Jeff, I'd be interested in taking a look at the code for this >tool. My github ID is grishick. > >Thanks, >Greg > >----- Original Message ----- >From: "Jeff Wartes" <jwar...@whitepages.com> >To: solr-user@lucene.apache.org >Sent: Monday, August 18, 2014 9:49:28 PM >Subject: Re: How to restore an index from a backup over HTTP > >I¹m able to do cross-solrcloud-cluster index copy using nothing more than >careful use of the ³fetchindex² replication handler command. > >I¹m using this as a build/deployment tool, so I manually create a >collection in two clusters, index into one, test, and then ask the other >cluster to fetchindex from it on each shard/replica. > >Some caveats: > 1. It seems like fetchindex may silently decline if it thinks the index >it has is newer. > 2. I¹m not doing this on an index that¹s currently receiving updates. > 3. SolrCloud replication doesn¹t come into this flow, even if you >fetchindex on a leader. (although once you¹re done, updates should get >replicated normally) > 4. Both collections must be created with the same number of shards and >sharding mechanism. (although replication factor can vary) > > >I¹ve got a tool for automating this that I¹d like to push to github at >some point, let me know if you¹re interested. > > > > > >On 8/16/14, 3:03 AM, "Greg Solovyev" <g...@zimbra.com> wrote: > >>Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty >>straight forward, but the main concern I have is the internal data format >>that ReplicationHandler and SnapPuller use. This new handler as well as >>the code that I've already written to download the index files from Solr >>will depend on that format. Unfortunately, this format is not documented >>and is not abstracted by SolrJ, so I wonder what I can do to make sure it >>does not change on us without notice. >> >>Thanks, >>Greg >> >>----- Original Message ----- >>From: "Shawn Heisey" <s...@elyograg.org> >>To: solr-user@lucene.apache.org >>Sent: Friday, August 15, 2014 7:31:19 PM >>Subject: Re: How to restore an index from a backup over HTTP >> >>On 8/15/2014 5:51 AM, Greg Solovyev wrote: >>> What I want to achieve is being able to send the backed up index to >>>Solr (either standalone or with ZooKeeper) in a way similar to creating >>>a new Collection. I.e. create a new collection and upload an exiting >>>index directly into that Collection. I've looked through Solr code and >>>so far I have not found a handler that would allow this scenario. So, >>>the last idea is to implement a special handler for this case, perhaps >>>extending CoreAdminHandler. ReplicationHandler together with SnapPuller >>>do pretty much what I need to do, except that the action has to be >>>initiated by the receiving Solr server and I need to initiate the action >>>externally. I.e., instead of having Solr slave download an index from >>>Solr master, I need to feed the index to Solr master and ideally this >>>would work the same way in standalone and SolrCloud modes. >> >>I have not made any attempt to verify what I'm stating below. It may >>not work. >> >>What I think I would *try* is setting up a standalone Solr (no cloud) on >>the backup server. Use scripted index/config copies and Solr start/stop >>actions to get the index up and running on a known core in the >>standalone Solr. Then use the replication handler's HTTP API to >>replicate the index from that standalone server to each of the replicas >>in your cluster. >> >>https://wiki.apache.org/solr/SolrReplication#HTTP_API >>https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexR >>e >>plication-HTTPAPICommandsfortheReplicationHandler >> >>One thing that I do not know is whether SolrCloud itself might interfere >>with these actions, or whether it might automatically take care of >>additional replicas if you replicate to the shard leader. If SolrCloud >>*would* interfere, then this idea might need special support in >>SolrCloud, perhaps as an extension to the Collections API. If it won't >>interfere, then the use-case would need to be documented (on the user >>wiki at a minimum) so that committers will be aware of it and preserve >>the capability in future versions. An extension to the Collections API >>might be a good idea either way -- I've seen a number of questions about >>capability that falls under this basic heading. >> >>Thanks, >>Shawn