If we have more than half a dozen processes posting updates to our Solr
instance (each about 1/sec), we get regular and fairly reproducible solr
hangs, requiring a kill -9 of the solr process. We're running Solr 3.4, a
fairly out-of-the-box solr/jetty setup.

I've got a thread dump and some GC logging information from the latest one.
 I haven't programmed in java for a decade, and any help decoding these
dumps would be hugely appreciated.

When I first was alerted that the process was hanging, this is the count of
what threads were doing what:

    33   java.lang.Thread.State: BLOCKED (on object monitor)
     16   java.lang.Thread.State: RUNNABLE
      2   java.lang.Thread.State: TIMED_WAITING (on object monitor)
      1   java.lang.Thread.State: WAITING (on object monitor)
     77   java.lang.Thread.State: WAITING (parking)

A couple minutes later it looked like this:

    29      java.lang.Thread.State: BLOCKED (on object monitor)
      8      java.lang.Thread.State: RUNNABLE
      1      java.lang.Thread.State: TIMED_WAITING (on object monitor)
      1      java.lang.Thread.State: TIMED_WAITING (parking)
      1      java.lang.Thread.State: TIMED_WAITING (sleeping)
      3      java.lang.Thread.State: WAITING (on object monitor)
      1      java.lang.Thread.State: WAITING (parking)

All the blocked threads looked like this:

java.lang.Thread.State: BLOCKED (on object monitor)
   at java.util.Collections$SynchronizedMap.get(Collections.java:1975)
   - waiting to lock <0x000000072551f908> (a
java.util.Collections$SynchronizedMap)
   at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:68)
   at
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441)
   at
org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46)
   at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175)
   at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187)
   at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266)
   at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
   at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
   at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
   at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
   at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
   at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:242)
   at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:90)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
   at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
   at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:213)
   ...

and yes, that 0x000000072551f908 address is the same for each thread that's
blocked.

In the first case where there were 77 threads waiting (parking),  they were
all

2012-08-30T20:23:55-0700     java.lang.Thread.State: WAITING (parking)
2012-08-30T20:23:55-0700        at sun.misc.Unsafe.park(Native Method)
2012-08-30T20:23:55-0700        - parking to wait for  <0x0000000726089700>
(a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
2012-08-30T20:23:55-0700        at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
2012-08-30T20:23:55-0700        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
2012-08-30T20:23:55-0700        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
2012-08-30T20:23:55-0700        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
2012-08-30T20:23:55-0700        at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
2012-08-30T20:23:55-0700        at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:214)
2012-08-30T20:23:55-0700        at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
2012-08-30T20:23:55-0700        at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
2012-08-30T20:23:55-0700        at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
2012-08-30T20:23:55-0700        at
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
2012-08-30T20:23:55-0700        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
...

and yes, they're all waiting for 0x0000000726089700


A gist of the first thread dump, with most of the repetition redacted, is
here: https://gist.github.com/3548920

Any suggestions would be super helpful.  Thanks!



-- 
Kevin M. Goess
Software Engineer
Berkeley Electronic Press

Reply via email to