Shawn, Thank you for your answer. for the purpose of testing it we have a test environment where we are not indexing anymore. We also disabled the DIH delta import. so as I understand there shouldn't be any commits on the master. I also tried with <str name="commitReserveDuration">50:50:50</str> and get the same failure.
I tried changing and increasing various parameters on the master and slave but no luck yet. the master is functioning ok, we do have search results so I assume there is no index corruption on the master side. just to mention , we have done that many times before in the past few years, this started just now when we upgraded our solr from version 3.6 to version 4.3 and we reindexed all documents. if we have no solution soon, and this is holding an upgrade to our production site and various customers, do you think we can copy the index directory from the master to the slave and hope that future replication will work ? Thank you again. Shalom On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote: > >> we are continuously getting this exception during replication from >> master to slave. our index size is 9.27 G and we are trying to replicate >> a slave from scratch. >> Its a different file each time , sometimes we get to 60% replication >> before it fails and sometimes only 10%, we never managed a successful >> replication. >> > > <snip> > > > this is the master setup: >> >> |<requestHandler name="/replication" class="solr.**ReplicationHandler" > >> <lst name="master"> >> <str name="replicateAfter">commit</**str> >> <str name="replicateAfter">startup<**/str> >> <str name="confFiles">stopwords.**txt,spellings.txt,synonyms.** >> txt,protwords.txt,elevate.xml,**currency.xml</str> >> <str name="commitReserveDuration">**00:00:50</str> >> </lst> >> </requestHandler> >> > > I assume that you're probably doing commits fairly often, resulting in a > lot of merge activity that frequently deletes segments. That > "commitReserveDuration" parameter needs to be made larger. I would imagine > that it takes a lot more than 50 seconds to do the replication - even if > you've got an extremely fast network, replicating 9.7GB probably takes > several minutes. > > From the wiki page on replication: "If your commits are very frequent and > network is particularly slow, you can tweak an extra attribute <str > name="commitReserveDuration">**00:00:10</str>. This is roughly the time > taken to download 5MB from master to slave. Default is 10 secs." > > http://wiki.apache.org/solr/**SolrReplication#Master<http://wiki.apache.org/solr/SolrReplication#Master> > > You've said that your network is not slow, but with that much data, all > networks are slow. > > Thanks, > Shawn > >