Thanks Jeff, I'd be interested in taking a look at the code for this tool. My github ID is grishick.
Thanks, Greg ----- Original Message ----- From: "Jeff Wartes" <jwar...@whitepages.com> To: solr-user@lucene.apache.org Sent: Monday, August 18, 2014 9:49:28 PM Subject: Re: How to restore an index from a backup over HTTP I¹m able to do cross-solrcloud-cluster index copy using nothing more than careful use of the ³fetchindex² replication handler command. I¹m using this as a build/deployment tool, so I manually create a collection in two clusters, index into one, test, and then ask the other cluster to fetchindex from it on each shard/replica. Some caveats: 1. It seems like fetchindex may silently decline if it thinks the index it has is newer. 2. I¹m not doing this on an index that¹s currently receiving updates. 3. SolrCloud replication doesn¹t come into this flow, even if you fetchindex on a leader. (although once you¹re done, updates should get replicated normally) 4. Both collections must be created with the same number of shards and sharding mechanism. (although replication factor can vary) I¹ve got a tool for automating this that I¹d like to push to github at some point, let me know if you¹re interested. On 8/16/14, 3:03 AM, "Greg Solovyev" <g...@zimbra.com> wrote: >Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty >straight forward, but the main concern I have is the internal data format >that ReplicationHandler and SnapPuller use. This new handler as well as >the code that I've already written to download the index files from Solr >will depend on that format. Unfortunately, this format is not documented >and is not abstracted by SolrJ, so I wonder what I can do to make sure it >does not change on us without notice. > >Thanks, >Greg > >----- Original Message ----- >From: "Shawn Heisey" <s...@elyograg.org> >To: solr-user@lucene.apache.org >Sent: Friday, August 15, 2014 7:31:19 PM >Subject: Re: How to restore an index from a backup over HTTP > >On 8/15/2014 5:51 AM, Greg Solovyev wrote: >> What I want to achieve is being able to send the backed up index to >>Solr (either standalone or with ZooKeeper) in a way similar to creating >>a new Collection. I.e. create a new collection and upload an exiting >>index directly into that Collection. I've looked through Solr code and >>so far I have not found a handler that would allow this scenario. So, >>the last idea is to implement a special handler for this case, perhaps >>extending CoreAdminHandler. ReplicationHandler together with SnapPuller >>do pretty much what I need to do, except that the action has to be >>initiated by the receiving Solr server and I need to initiate the action >>externally. I.e., instead of having Solr slave download an index from >>Solr master, I need to feed the index to Solr master and ideally this >>would work the same way in standalone and SolrCloud modes. > >I have not made any attempt to verify what I'm stating below. It may >not work. > >What I think I would *try* is setting up a standalone Solr (no cloud) on >the backup server. Use scripted index/config copies and Solr start/stop >actions to get the index up and running on a known core in the >standalone Solr. Then use the replication handler's HTTP API to >replicate the index from that standalone server to each of the replicas >in your cluster. > >https://wiki.apache.org/solr/SolrReplication#HTTP_API >https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe >plication-HTTPAPICommandsfortheReplicationHandler > >One thing that I do not know is whether SolrCloud itself might interfere >with these actions, or whether it might automatically take care of >additional replicas if you replicate to the shard leader. If SolrCloud >*would* interfere, then this idea might need special support in >SolrCloud, perhaps as an extension to the Collections API. If it won't >interfere, then the use-case would need to be documented (on the user >wiki at a minimum) so that committers will be aware of it and preserve >the capability in future versions. An extension to the Collections API >might be a good idea either way -- I've seen a number of questions about >capability that falls under this basic heading. > >Thanks, >Shawn