I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
requests simultaneously hitting the same Solr machine. Sometimes Solr,
Tomcat, and my (C#) indexing process conspire to render search
inoperable. So far I've only noticed this while big segment merges
(i.e. merges that take multiple minutes) are taking place.

Let me explain the situation as best as I understand it.

My indexer has a main loop that looks roughly like this:

  while true:
    try:
      submit a new add or delete request to Solr via HTTP
    catch timeoutException:
      sleep a few seconds

When things are going wrong (i.e., when a large segment merge is
happening), this loop is problematic:

* When the indexer's request hits Solr, then the corresponding thread
in Tomcat blocks. (It looks to me like the thread is destined to block
until the entire merge is complete. I'll paste in what the Java stack
traces look like at the end of the message if they can help diagnose
things.)
* Because the Solr thread stays blocked for so long, eventually the
indexer hits a timeoutException. (That is, it gives up on Solr.)
* Hitting the timeout exception doesn't cause the corresponding Tomcat
thread to die or unblock. Therefore, each time through the loop,
another Solr-handling thread inside Tomcat enters a blocked state.
* Eventually so many threads (maxThreads, whose Tomcat default is 200)
are blocked that Tomcat starts rejecting all new Solr HTTP requests --
including those coming in from the web tier.
* Users are unable to search. The problem might self-correct once the
merge is complete, but that could be quite a while.

What are my options for changing Solr settings or changing my indexing
process to avoid this lockup scenario? Do you agree that the segment
merge is helping cause the lockup? Do adds and deletes really need to
block on segment merges?

Partial thread dumps follow, showing example add and delete threads
that are blocked. Also the active Lucene Merge Thread, and the thread
that kicked off the merge.

[doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock()
to return]
"http-1800-200" daemon prio=6 tid=0x000000000a58cc00 nid=0x1028
waiting on condition [0x000000000f9ae000..0x000000000f9afa90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000016d801ae0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
Source)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
Source)
        at 
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)
        at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Unknown Source)

[doc adding thread, waiting for DirectUpdateHandler2.iwAccess.lock() to return]
"http-1800-70" daemon prio=6 tid=0x0000000007946400 nid=0x590 waiting
on condition [0x000000000d76e000..0x000000000d76f910]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000016d801ae0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown
Source)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown
Source)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown
Source)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:211)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:118)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:123)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:192)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Unknown Source)

All 200 front-line Solr Tomcat threads, from http-1800-1 through
http-1800-200, are blocked in one of these two ways.

[The thread that kicked off the commit/merge. Note that
DirectUpdateHandler2.commit() calls iwCommit.lock(), and seems not to
release the iwCommit lock until DIH2.commit() returns.]
"pool-15-thread-1" prio=6 tid=0x000000000abc3c00 nid=0x1094 in
Object.wait() [0x000000000b6ff000..0x000000000b6ff910]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000001b1ba84a0> (a 
org.apache.solr.update.SolrIndexWriter)
        at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5351)
        - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter)
        at 
org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:3479)
        - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter)
        at 
org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:3463)
        - locked <0x00000001b1ba84a0> (a org.apache.solr.update.SolrIndexWriter)
        at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2200)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117)
        at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230)
        at 
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
        at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
        at 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602)
        - locked <0x000000016d801760> (a
org.apache.solr.update.DirectUpdateHandler2$CommitTracker)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

[The only active Lucene merge thread]
"Lucene Merge Thread #0" daemon prio=6 tid=0x000000000930ac00
nid=0xb38 runnable [0x000000000a41e000..0x000000000a41f790]
   java.lang.Thread.State: RUNNABLE
        at java.io.RandomAccessFile.readBytes(Native Method)
        at java.io.RandomAccessFile.read(Unknown Source)
        at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
        - locked <0x00000001b443e588> (a
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor)
        at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
        at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
        at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
        at 
org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:73)
        at 
org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)
        at 
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:707)
        at 
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:648)
        at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:586)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
        at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
        at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

Reply via email to