On 1/3/2014 10:34 AM, Daniel Collins wrote:
We see this a lot as well, my understanding is that recovery asks the
leader for a list of the files that it should download, then it downloads
them.  But if the leader has been merging segments whilst this is going on
(recovery is taking a reasonable period of time and you have an NRT system
where commits/merges are reasonably frequent), then the segments might
disappear during this recovery period, hence the replica can't download
them.

So its an error, but something the system can recover from, since it will
re-recover, which should pick up a (larger) segment next time, which is
less likely to be removed whilst recovery is going on.

This can probably be fixed by increasing the commitReserveDuration setting in the master replication configuration. In the context of replication, this controls the amount of time that Solr (Lucene really) will hold on to segments that are slated for deletion. It defaults to 10 seconds, but when you have a lot of data to replicate, it can take considerably longer than 10 seconds for a replication to finish.

http://wiki.apache.org/solr/SolrReplication#Master

Thanks,
Shawn

Reply via email to