Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty straight forward, but the main concern I have is the internal data format that ReplicationHandler and SnapPuller use. This new handler as well as the code that I've already written to download the index files from Solr will depend on that format. Unfortunately, this format is not documented and is not abstracted by SolrJ, so I wonder what I can do to make sure it does not change on us without notice.
Thanks, Greg ----- Original Message ----- From: "Shawn Heisey" <s...@elyograg.org> To: solr-user@lucene.apache.org Sent: Friday, August 15, 2014 7:31:19 PM Subject: Re: How to restore an index from a backup over HTTP On 8/15/2014 5:51 AM, Greg Solovyev wrote: > What I want to achieve is being able to send the backed up index to Solr > (either standalone or with ZooKeeper) in a way similar to creating a new > Collection. I.e. create a new collection and upload an exiting index directly > into that Collection. I've looked through Solr code and so far I have not > found a handler that would allow this scenario. So, the last idea is to > implement a special handler for this case, perhaps extending > CoreAdminHandler. ReplicationHandler together with SnapPuller do pretty much > what I need to do, except that the action has to be initiated by the > receiving Solr server and I need to initiate the action externally. I.e., > instead of having Solr slave download an index from Solr master, I need to > feed the index to Solr master and ideally this would work the same way in > standalone and SolrCloud modes. I have not made any attempt to verify what I'm stating below. It may not work. What I think I would *try* is setting up a standalone Solr (no cloud) on the backup server. Use scripted index/config copies and Solr start/stop actions to get the index up and running on a known core in the standalone Solr. Then use the replication handler's HTTP API to replicate the index from that standalone server to each of the replicas in your cluster. https://wiki.apache.org/solr/SolrReplication#HTTP_API https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler One thing that I do not know is whether SolrCloud itself might interfere with these actions, or whether it might automatically take care of additional replicas if you replicate to the shard leader. If SolrCloud *would* interfere, then this idea might need special support in SolrCloud, perhaps as an extension to the Collections API. If it won't interfere, then the use-case would need to be documented (on the user wiki at a minimum) so that committers will be aware of it and preserve the capability in future versions. An extension to the Collections API might be a good idea either way -- I've seen a number of questions about capability that falls under this basic heading. Thanks, Shawn