On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote: > Shawn, Thank you for your answer. > for the purpose of testing it we have a test environment where we are not > indexing anymore. We also disabled the DIH delta import. so as I understand > there shouldn't be any commits on the master. > I also tried with > <str name="commitReserveDuration">50:50:50</str> > and get the same failure.
If it's in an environment where there are no commits, that's really odd. I would suspect underlying filesystem or network issues. There's one problem that's not well known, but is very common - problems with NIC firmware, most commonly Broadcom NICs. These problems result in things working correctly almost all the time, but when there is a high network load, things break in strange ways, and the resulting errors rarely look like they are network-related. Most embedded NICs are either Broadcom or Realtek, both of which are famous for their firmware problems. Broadcom NICs are very common on Dell and HP servers. Upgrading the firmware (which is not usually the same thing as upgrading the driver) is the only fix. NICs from other manufacturers also have upgradable firmware, but don't usually have the same kind of high-profile problems as Broadcom. The NIC firmware might not have anything to do with this problem, but it's the only thing left that I can think of. I personally haven't used replication since Solr 1.4.1, but a lot of people do. I can't say that there's no bugs, but so far I'm not seeing the kind of problem reports that appear when a bug in a critical piece of the software exists. Thanks, Shawn