For the record, this was caused by a rookie mistake: FD exhaustion. --Casey
On 8/24/12 11:24 AM, Casey Callendrello wrote: > Hi there, > I have been doing some load testing with Solr 4 beta (now, trunk). My > configuration is fairly simple - two servers, replicating via > SolrCloud. SolrCloud is configured as recommended in the wiki: > > <updateRequestProcessorChain name="standard"> > <processor class="solr.LogUpdateProcessorFactory" /> > <processor class="solr.DistributedUpdateProcessorFactory" /> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > > Twice now I've seen sudden thread and file-descriptor spikes along > with a complete deadlock, simultaneously on both machines. My max FDs > is set to 1024, and (excepting the spikes) I never see usage over 375 > fds. > > The first FD spike was with an older trunk revision. It was > co-incident with a corrupt transaction log. I've lost the logs, > unfortunately, but SOLR tried to re-process the same log over and > over, leaking FDs and dying. > > The upgraded version has not reported the corrupt transaction issue > prior to deadlock. However, according to the log files, the deadlock > persists for about 5 minutes prior to FD exhaustion. The last log line > is simply "INFO: end_commit_flush" > > Upon restart, I see a frightening amount of corrupt transaction log > exceptions and " New transaction log already exists" exceptions. > > Any thoughts? > Contact me for the thread dump; it's 1 MiB. > > Thanks, > --Casey C.
signature.asc
Description: OpenPGP digital signature