This sounds pretty complete to me. Otis Solr & ElasticSearch Support http://sematext.com/ On Jun 11, 2013 4:21 AM, "Cosimo Streppone" <cos...@streppone.it> wrote:
> Hi, > > I'd like your advice on this backup plan. > It's my first Solr deployment (4.0). > > Production consists of 1 master and n frontend slaves > placed in different datacenters, replicating through HTTP. > Only master is backed up. Frontend slaves can die anytime > or go stale for a while and that's ok. > > Backup is performed daily. Steps are: > > 1) Ping the /replication handler with command=backup and numberToKeep=3 > and verify that we get a status=0 > > 2) Check the replication handler with command=details and verify that > we get a "snapshotCompletedAt". If not, spin and wait for it. > > 3) Snapshot is completed. Rsync --delete everything to a different > volume on the same host. This is to keep a complete archived *local* > copy should the index SSD drive fail. > > 4) Once the rsync is finished, a stand by machine downloads the archived > copy from the master, and rebuilds everything under a "restore" core. > > 5) New "restore" core is started up with /admin/cores handler > (command=CREATE IIRC) > > 6) Nagios checks that we can query the restore core correctly > and get back at least a document from it. > > In this way, I get: > - 3 (n) quick snapshots done by Solr itself. Older ones are discarded > automatically > - 1 full index copy on a secondary volume > - 1 "offsite" copy on another machine > - a daily automated restore that verifies that our backup is valid > > It's been running reliably for a week or so now, > but surely someone out there must have done this before > > Did I miss something? > > -- > Cosimo >