We are running two Solr servers (master/slave) on EC2 and have the solr home directories on EBS drives that we snapshot every 12 hours. While that will mean that we will lose at most 12 hours of data, I wondered if there was a way I could reduce the window of data loss. With our mysql servers, we snapshot every 12 hours but also copy the binary logs to S3 every 5 minutes.
We are doing commits every 10 minutes on the master and will be using the built-in java replication (today we are using snapshotting to replicate but are in the process of migrating from 1.3 to 1.4). On a related note, are we doing the right thing in having our slave solr home directory on an EBS volume? If the slave were to die and we had to create a fresh one, will it just resync the entire index from the master? is the reason to have the slave on an EBS volume so that the slave has less data to resync on startup? thanks in advance Athir