Hello, I am looking for advice on implementing the following backup/restore scenario. We are using Solr to index email. Each mailbox has it's own Collection. We do not store emails in Solr, the emails are stored on disk in a blob store, meta data is stored in a database and Solr is used only for full text search. The scenario is restoring a mailbox from a backup. The backup of a mailbox contains blobs, meta data in a SQL file. We can also pull Lucene index files from Solr using ReplicationHandler in the same way Solr's SnapPuller does it on a slave server. We already have restore utility that restores blobs and meta-data, but are working on a mechanism to backup and restore Solr index in a way that allows us to package each mailbox into a separate backup folder/archive.
An obvious first idea for restoring is to drop the index files into a new folder on one of the existing Solr servers and make it pick up the new collection - that's simple. However, this approach has two downsides 1 - it requires that SSH access is set up between the machine where backup-and-restore script is running and Solr server, 2 - if Solr is running in SolrCloud mode, this approach bypasses ZooKeeper and we would have to pick the Solr instance for this new Collection without ZooKeeper. Another idea is to not include index files in backups and re-index mail upon restoring it. This isn't a good idea at all when restoring large mailboxes. What I want to achieve is being able to send the backed up index to Solr (either standalone or with ZooKeeper) in a way similar to creating a new Collection. I.e. create a new collection and upload an exiting index directly into that Collection. I've looked through Solr code and so far I have not found a handler that would allow this scenario. So, the last idea is to implement a special handler for this case, perhaps extending CoreAdminHandler. ReplicationHandler together with SnapPuller do pretty much what I need to do, except that the action has to be initiated by the receiving Solr server and I need to initiate the action externally. I.e., instead of having Solr slave download an index from Solr master, I need to feed the index to Solr master and ideally this would work the same way in standalone and SolrCloud modes. What are your thoughts and ideas on the subject? Thanks, Greg