I don't know much about how Solr does its locking, so I'm guessing below:

It looks like one thread is doing a commit, by closing the writer, and
is likely holding a lock that prevents other (add/delete) ops from
running? Probably this lock is held because the writer is in the
process of being closed, and on close, the write waits for running
merges to complete, so it can take a very long time if a large merge
is running.

And then your while loop keeps using up another of the 200 threads,
blocking on the add/delete request.

I think Solr could, instead, call IndexWriter.finishMerges, without
holding the lock, and then perhaps IndexWriter.close(false), which
would be fast (ie, aborts any running merges, for the race condition
where another merge just started after finishMerges and before close).
 Alternatively, Solr could call IndexWriter.commit, not
IndexWriter.close, and not hold the lock that prevents add/deletes
(but maybe there are other reasons why the IW must be closed?).

Maybe Solr should also have a way to restrict the max # threads to be
used for pending add/delete ops, so that there are always thread free
in the app server's pool for searching?

Or... maybe you could drastically increase the timeout on your client
side HTTP connections?  Or, is there some way to check how many
threads are "tied up" in Solr and block your add/delete requests when
this gets too large...?

Mike

On Thu, Apr 22, 2010 at 6:28 PM, Chris Harris <rygu...@gmail.com> wrote:
> I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
> requests simultaneously hitting the same Solr machine. Sometimes Solr,
> Tomcat, and my (C#) indexing process conspire to render search
> inoperable. So far I've only noticed this while big segment merges
> (i.e. merges that take multiple minutes) are taking place.
>
> Let me explain the situation as best as I understand it.
>
> My indexer has a main loop that looks roughly like this:
>
>  while true:
>    try:
>      submit a new add or delete request to Solr via HTTP
>    catch timeoutException:
>      sleep a few seconds
>
> When things are going wrong (i.e., when a large segment merge is
> happening), this loop is problematic:
>
> * When the indexer's request hits Solr, then the corresponding thread
> in Tomcat blocks. (It looks to me like the thread is destined to block
> until the entire merge is complete. I'll paste in what the Java stack
> traces look like at the end of the message if they can help diagnose
> things.)
> * Because the Solr thread stays blocked for so long, eventually the
> indexer hits a timeoutException. (That is, it gives up on Solr.)
> * Hitting the timeout exception doesn't cause the corresponding Tomcat
> thread to die or unblock. Therefore, each time through the loop,
> another Solr-handling thread inside Tomcat enters a blocked state.
> * Eventually so many threads (maxThreads, whose Tomcat default is 200)
> are blocked that Tomcat starts rejecting all new Solr HTTP requests --
> including those coming in from the web tier.
> * Users are unable to search. The problem might self-correct once the
> merge is complete, but that could be quite a while.
>
> What are my options for changing Solr settings or changing my indexing
> process to avoid this lockup scenario? Do you agree that the segment
> merge is helping cause the lockup? Do adds and deletes really need to
> block on segment merges?
>
> Partial thread dumps follow, showing example add and delete threads
> that are blocked. Also the active Lucene Merge Thread, and the thread
> that kicked off the merge.
>
> [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock()
> to return]
> "http-1800-200" daemon prio=6 tid=0x000000000a58cc00 nid=0x1028
> waiting on condition [0x000000000f9ae000..0x000000000f9afa90]
>   java.lang.Thread.State: WAITING (parking)
>        at sun.misc.Unsafe.park(Native Method)
>        - parking to wait for  <0x000000016d801ae0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
> Source)
>        at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
> Source)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)
>        at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
>        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>        at java.lang.Thread.run(Unknown Source)
>
> [doc adding thread, waiting for DirectUpdateHandler2.iwAccess.lock() to 
> return]
> "http-1800-70" daemon prio=6 tid=0x0000000007946400 nid=0x590 waiting
> on condition [0x000000000d76e000..0x000000000d76f910]
>   java.lang.Thread.State: WAITING (parking)
>        at sun.misc.Unsafe.park(Native Method)
>        - parking to wait for  <0x000000016d801ae0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown
> Source)
>        at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown
> Source)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:211)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>        at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:118)
>        at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:123)
>        at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:192)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>        at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>        at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>        at java.lang.Thread.run(Unknown Source)
>
> All 200 front-line Solr Tomcat threads, from http-1800-1 through
> http-1800-200, are blocked in one of these two ways.
>
> [The thread that kicked off the commit/merge. Note that
> DirectUpdateHandler2.commit() calls iwCommit.lock(), and seems not to
> release the iwCommit lock until DIH2.commit() returns.]
> "pool-15-thread-1" prio=6 tid=0x000000000abc3c00 nid=0x1094 in
> Object.wait() [0x000000000b6ff000..0x000000000b6ff910]
>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x00000001b1ba84a0> (a 
> org.apache.solr.update.SolrIndexWriter)
>        at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5351)
>        - locked <0x00000001b1ba84a0> (a 
> org.apache.solr.update.SolrIndexWriter)
>        at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:3479)
>        - locked <0x00000001b1ba84a0> (a 
> org.apache.solr.update.SolrIndexWriter)
>        at 
> org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:3463)
>        - locked <0x00000001b1ba84a0> (a 
> org.apache.solr.update.SolrIndexWriter)
>        at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2200)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117)
>        at 
> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
>        at 
> org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602)
>        - locked <0x000000016d801760> (a
> org.apache.solr.update.DirectUpdateHandler2$CommitTracker)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>        at java.util.concurrent.FutureTask.run(Unknown Source)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
> Source)
>        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
> Source)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>        at java.lang.Thread.run(Unknown Source)
>
> [The only active Lucene merge thread]
> "Lucene Merge Thread #0" daemon prio=6 tid=0x000000000930ac00
> nid=0xb38 runnable [0x000000000a41e000..0x000000000a41f790]
>   java.lang.Thread.State: RUNNABLE
>        at java.io.RandomAccessFile.readBytes(Native Method)
>        at java.io.RandomAccessFile.read(Unknown Source)
>        at 
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132)
>        - locked <0x00000001b443e588> (a
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor)
>        at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
>        at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>        at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
>        at 
> org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:73)
>        at 
> org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)
>        at 
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:707)
>        at 
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:648)
>        at 
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:586)
>        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
>        at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
>

Reply via email to