Hi, I'd like your advice on this backup plan. It's my first Solr deployment (4.0).
Production consists of 1 master and n frontend slaves placed in different datacenters, replicating through HTTP. Only master is backed up. Frontend slaves can die anytime or go stale for a while and that's ok. Backup is performed daily. Steps are: 1) Ping the /replication handler with command=backup and numberToKeep=3 and verify that we get a status=0 2) Check the replication handler with command=details and verify that we get a "snapshotCompletedAt". If not, spin and wait for it. 3) Snapshot is completed. Rsync --delete everything to a different volume on the same host. This is to keep a complete archived *local* copy should the index SSD drive fail. 4) Once the rsync is finished, a stand by machine downloads the archived copy from the master, and rebuilds everything under a "restore" core. 5) New "restore" core is started up with /admin/cores handler (command=CREATE IIRC) 6) Nagios checks that we can query the restore core correctly and get back at least a document from it. In this way, I get: - 3 (n) quick snapshots done by Solr itself. Older ones are discarded automatically - 1 full index copy on a secondary volume - 1 "offsite" copy on another machine - a daily automated restore that verifies that our backup is valid It's been running reliably for a week or so now, but surely someone out there must have done this before Did I miss something? -- Cosimo