Hello, I am looking for advice on implementing the following backup/restore 
scenario. 
We are using Solr to index email. Each mailbox has it's own Collection. We do 
not store emails in Solr, the emails are stored on disk in a blob store, meta 
data is stored in a database and Solr is used only for full text search. The 
scenario is restoring a mailbox from a backup. The backup of a mailbox contains 
blobs, meta data in a SQL file. We can also pull Lucene index files from Solr 
using ReplicationHandler in the same way Solr's SnapPuller does it on a slave 
server. We already have restore utility that restores blobs and meta-data, but 
are working on a mechanism to backup and restore Solr index in a way that 
allows us to package each mailbox into a separate backup folder/archive. 

An obvious first idea for restoring is to drop the index files into a new 
folder on one of the existing Solr servers and make it pick up the new 
collection - that's simple. However, this approach has two downsides 1 - it 
requires that SSH access is set up between the machine where backup-and-restore 
script is running and Solr server, 2 - if Solr is running in SolrCloud mode, 
this approach bypasses ZooKeeper and we would have to pick the Solr instance 
for this new Collection without ZooKeeper. 

Another idea is to not include index files in backups and re-index mail upon 
restoring it. This isn't a good idea at all when restoring large mailboxes. 

What I want to achieve is being able to send the backed up index to Solr 
(either standalone or with ZooKeeper) in a way similar to creating a new 
Collection. I.e. create a new collection and upload an exiting index directly 
into that Collection. I've looked through Solr code and so far I have not found 
a handler that would allow this scenario. So, the last idea is to implement a 
special handler for this case, perhaps extending CoreAdminHandler. 
ReplicationHandler together with SnapPuller do pretty much what I need to do, 
except that the action has to be initiated by the receiving Solr server and I 
need to initiate the action externally. I.e., instead of having Solr slave 
download an index from Solr master, I need to feed the index to Solr master and 
ideally this would work the same way in standalone and SolrCloud modes. 

What are your thoughts and ideas on the subject? 

Thanks, 
Greg 

Reply via email to