If we have more than half a dozen processes posting updates to our Solr instance (each about 1/sec), we get regular and fairly reproducible solr hangs, requiring a kill -9 of the solr process. We're running Solr 3.4, a fairly out-of-the-box solr/jetty setup.
I've got a thread dump and some GC logging information from the latest one. I haven't programmed in java for a decade, and any help decoding these dumps would be hugely appreciated. When I first was alerted that the process was hanging, this is the count of what threads were doing what: 33 java.lang.Thread.State: BLOCKED (on object monitor) 16 java.lang.Thread.State: RUNNABLE 2 java.lang.Thread.State: TIMED_WAITING (on object monitor) 1 java.lang.Thread.State: WAITING (on object monitor) 77 java.lang.Thread.State: WAITING (parking) A couple minutes later it looked like this: 29 java.lang.Thread.State: BLOCKED (on object monitor) 8 java.lang.Thread.State: RUNNABLE 1 java.lang.Thread.State: TIMED_WAITING (on object monitor) 1 java.lang.Thread.State: TIMED_WAITING (parking) 1 java.lang.Thread.State: TIMED_WAITING (sleeping) 3 java.lang.Thread.State: WAITING (on object monitor) 1 java.lang.Thread.State: WAITING (parking) All the blocked threads looked like this: java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Collections$SynchronizedMap.get(Collections.java:1975) - waiting to lock <0x000000072551f908> (a java.util.Collections$SynchronizedMap) at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:68) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441) at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:175) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:187) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:266) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:242) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:53) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:90) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:213) ... and yes, that 0x000000072551f908 address is the same for each thread that's blocked. In the first case where there were 77 threads waiting (parking), they were all 2012-08-30T20:23:55-0700 java.lang.Thread.State: WAITING (parking) 2012-08-30T20:23:55-0700 at sun.misc.Unsafe.park(Native Method) 2012-08-30T20:23:55-0700 - parking to wait for <0x0000000726089700> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) 2012-08-30T20:23:55-0700 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 2012-08-30T20:23:55-0700 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) 2012-08-30T20:23:55-0700 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) 2012-08-30T20:23:55-0700 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) 2012-08-30T20:23:55-0700 at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) 2012-08-30T20:23:55-0700 at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:214) 2012-08-30T20:23:55-0700 at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) 2012-08-30T20:23:55-0700 at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) 2012-08-30T20:23:55-0700 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158) 2012-08-30T20:23:55-0700 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) 2012-08-30T20:23:55-0700 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) ... and yes, they're all waiting for 0x0000000726089700 A gist of the first thread dump, with most of the repetition redacted, is here: https://gist.github.com/3548920 Any suggestions would be super helpful. Thanks! -- Kevin M. Goess Software Engineer Berkeley Electronic Press