I'm running Solr 1.4+ under Tomcat 6, with indexing and searching requests simultaneously hitting the same Solr machine. Sometimes Solr, Tomcat, and my (C#) indexing process conspire to render search inoperable. So far I've only noticed this while big segment merges (i.e. merges that take multiple minutes) are taking place.
Let me explain the situation as best as I understand it. My indexer has a main loop that looks roughly like this: while true: try: submit a new add or delete request to Solr via HTTP catch timeoutException: sleep a few seconds When things are going wrong (i.e., when a large segment merge is happening), this loop is problematic: * When the indexer's request hits Solr, then the corresponding thread in Tomcat blocks. (It looks to me like the thread is destined to block until the entire merge is complete. I'll paste in what the Java stack traces look like at the end of the message if they can help diagnose things.) * Because the Solr thread stays blocked for so long, eventually the indexer hits a timeoutException. (That is, it gives up on Solr.) * Hitting the timeout exception doesn't cause the corresponding Tomcat thread to die or unblock. Therefore, each time through the loop, another Solr-handling thread inside Tomcat enters a blocked state. * Eventually so many threads (maxThreads, whose Tomcat default is 200) are blocked that Tomcat starts rejecting all new Solr HTTP requests -- including those coming in from the web tier. * Users are unable to search. The problem might self-correct once the merge is complete, but that could be quite a while. What are my options for changing Solr settings or changing my indexing process to avoid this lockup scenario? Do you agree that the segment merge is helping cause the lockup? Do adds and deletes really need to block on segment merges? Partial thread dumps follow, showing example add and delete threads that are blocked. Also the active Lucene Merge Thread, and the thread that kicked off the merge. [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock() to return] "http-1800-200" daemon prio=6 tid=0x000000000a58cc00 nid=0x1028 waiting on condition [0x000000000f9ae000..0x000000000f9afa90] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000016d801ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown Source) at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71) at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) [doc adding thread, waiting for DirectUpdateHandler2.iwAccess.lock() to return] "http-1800-70" daemon prio=6 tid=0x0000000007946400 nid=0x590 waiting on condition [0x000000000d76e000..0x000000000d76f910] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000016d801ae0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:211) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:118) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:123) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:192) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) All 200 front-line Solr Tomcat threads, from http-1800-1 through http-1800-200, are blocked in one of these two ways. [The thread that kicked off the commit/merge. Note that DirectUpdateHandler2.commit() calls iwCommit.lock(), and seems not to release the iwCommit lock until DIH2.commit() returns.] "pool-15-thread-1" prio=6 tid=0x000000000abc3c00 nid=0x1094 in Object.wait() [0x000000000b6ff000..0x000000000b6ff910] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5351) - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:3479) - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:3463) - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2200) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409) at org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602) - locked <0x000000016d801760> (a org.apache.solr.update.DirectUpdateHandler2$CommitTracker) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) [The only active Lucene merge thread] "Lucene Merge Thread #0" daemon prio=6 tid=0x000000000930ac00 nid=0xb38 runnable [0x000000000a41e000..0x000000000a41f790] java.lang.Thread.State: RUNNABLE at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132) - locked <0x00000001b443e588> (a org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:73) at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:707) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:648) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:586) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)