Not sure if these help. Here's the stack trace and jmap -histo output during a long (bad) commit Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.0-b16 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00002aabf9954400 nid=0x5e1c runnable [0x0000000000000000..0x0000000042048d20] java.lang.Thread.State: RUNNABLE "Lucene Merge Thread #0" daemon prio=10 tid=0x00002aabf9c7f400 nid=0x5e00 runnable [0x0000000041f47000..0x0000000041f47ba0] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.RAMOutputStream.flush(RAMOutputStream.java:147) at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:56) at org.apache.lucene.index.MultiLevelSkipListWriter.writeSkip(MultiLevelSkipListWriter.java:146) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:588) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:499) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4291) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260) "pool-2-thread-1" prio=10 tid=0x00002aabf9b94c00 nid=0x5dc3 waiting on condition [0x0000000041a42000..0x0000000041a42aa0] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaabecd12d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source) at java.util.concurrent.DelayQueue.take(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "DestroyJavaVM" prio=10 tid=0x00002aabfc006c00 nid=0x5da3 waiting on condition [0x0000000000000000..0x000000004022ad10] java.lang.Thread.State: RUNNABLE "Timer-2" daemon prio=10 tid=0x00002aabfc007c00 nid=0x5dc1 in Object.wait() [0x0000000041e46000..0x0000000041e46d20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabef91338> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked <0x00002aaabef91338> (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) "Checkpointer" daemon prio=10 tid=0x00002aabf995b800 nid=0x5dc0 in Object.wait() [0x0000000041d45000..0x0000000041d45da0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabebf9700> (a java.lang.Object) at java.lang.Object.wait(Object.java:485) at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152) - locked <0x00002aaabebf9700> (a java.lang.Object) at java.lang.Thread.run(Unknown Source) "Cleaner-1" daemon prio=10 tid=0x00002aabf995b000 nid=0x5dbf in Object.wait() [0x0000000041c44000..0x0000000041c44c20] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabebee158> (a java.lang.Object) at java.lang.Object.wait(Object.java:485) at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152) - locked <0x00002aaabebee158> (a java.lang.Object) at java.lang.Thread.run(Unknown Source) "INCompressor" daemon prio=10 tid=0x00002aabf9a3cc00 nid=0x5dbe in Object.wait() [0x0000000041b43000..0x0000000041b43ca0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabeef3500> (a java.lang.Object) at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154) - locked <0x00002aaabeef3500> (a java.lang.Object) at java.lang.Thread.run(Unknown Source) "pool-1-thread-1" prio=10 tid=0x00002aabfa015800 nid=0x5dbc waiting on condition [0x0000000041941000..0x0000000041941ba0] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaabef91a08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "Timer-1" daemon prio=10 tid=0x00002aabfa0f1000 nid=0x5dbb in Object.wait() [0x0000000041840000..0x0000000041840a20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabef19a48> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked <0x00002aaabef19a48> (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) "btpool0-9 - Acceptor0 selectchannelconnec...@0.0.0.0:80" prio=10 tid=0x00002aabf9807800 nid=0x5dba runnable [0x000000004173f000..0x000000004173faa0] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source) at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source) at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source) - locked <0x00002aaabebcb2a8> (a sun.nio.ch.Util$1) - locked <0x00002aaabebcb290> (a java.util.Collections$UnmodifiableSet) - locked <0x00002aaabebc8b00> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(Unknown Source) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488) "btpool0-8 - Acceptor1 selectchannelconnec...@0.0.0.0:80" prio=10 tid=0x00002aabf980d000 nid=0x5db9 runnable [0x000000004163e000..0x000000004163ed20] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source) at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source) at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source) - locked <0x00002aaabebcb5f8> (a sun.nio.ch.Util$1) - locked <0x00002aaabebcb5e0> (a java.util.Collections$UnmodifiableSet) - locked <0x00002aaabebc4d30> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(Unknown Source) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488) "btpool0-7" prio=10 tid=0x00002aabf9594800 nid=0x5db8 in Object.wait() [0x000000004153c000..0x000000004153dda0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4554) - locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2873) - locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1718) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1674) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1648) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:175) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:359) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:205) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488) "btpool0-6" prio=10 tid=0x00002aabf98fdc00 nid=0x5db7 in Object.wait() [0x000000004143c000..0x000000004143cc20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf15c7b0> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf15c7b0> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-5" prio=10 tid=0x00002aabf9b10000 nid=0x5db6 in Object.wait() [0x000000004133b000..0x000000004133bca0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf160ab8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf160ab8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-4" prio=10 tid=0x00002aabf9b97400 nid=0x5db5 in Object.wait() [0x000000004123a000..0x000000004123ab20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf160d08> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf160d08> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-3" prio=10 tid=0x00002aabf9b96000 nid=0x5db4 in Object.wait() [0x0000000041139000..0x0000000041139ba0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf160f58> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf160f58> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-2" prio=10 tid=0x00002aabf9a07000 nid=0x5db3 in Object.wait() [0x0000000041038000..0x0000000041038a20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf1611a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf1611a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-1" prio=10 tid=0x00002aabf9324400 nid=0x5db2 in Object.wait() [0x0000000040f37000..0x0000000040f37aa0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf1613f8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf1613f8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "btpool0-0" prio=10 tid=0x00002aabf9a83800 nid=0x5db1 in Object.wait() [0x0000000040e36000..0x0000000040e36d20] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabf161648> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541) - locked <0x00002aaabf161648> (a org.mortbay.thread.QueuedThreadPool$PoolThread) "Timer-0" daemon prio=10 tid=0x00002aabf9af6000 nid=0x5db0 in Object.wait() [0x0000000040d35000..0x0000000040d35da0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabebacf80> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Unknown Source) - locked <0x00002aaabebacf80> (a java.util.TaskQueue) at java.util.TimerThread.run(Unknown Source) "Low Memory Detector" daemon prio=10 tid=0x00002aabf933fc00 nid=0x5dac runnable [0x0000000000000000..0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00002aabf933d000 nid=0x5dab waiting on condition [0x0000000000000000..0x0000000040a31360] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00002aabf9339000 nid=0x5daa waiting on condition [0x0000000000000000..0x00000000409303e0] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00002aabf9337400 nid=0x5da9 waiting on condition [0x0000000000000000..0x00000000408306b0] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00002aabf9314400 nid=0x5da8 in Object.wait() [0x000000004072f000..0x000000004072faa0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source) "Reference Handler" daemon prio=10 tid=0x00002aabf9312800 nid=0x5da7 in Object.wait() [0x000000004062e000..0x000000004062ed20] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source) - locked <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock) "VM Thread" prio=10 tid=0x00002aabf917a000 nid=0x5da6 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c800 nid=0x5da4 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x5da5 runnable "VM Periodic Task Thread" prio=10 tid=0x00002aabf9342000 nid=0x5dad waiting on condition JNI global references: 971 Heap PSYoungGen total 1395264K, used 965841K [0x00002aab8d4b0000, 0x00002aabf7f50000, 0x00002aabf7f50000) eden space 1030400K, 93% used [0x00002aab8d4b0000,0x00002aabc83d4788,0x00002aabcc2f0000) from space 364864K, 0% used [0x00002aabe1b00000,0x00002aabe1b10000,0x00002aabf7f50000) to space 352320K, 0% used [0x00002aabcc2f0000,0x00002aabcc2f0000,0x00002aabe1b00000) PSOldGen total 3495296K, used 642758K [0x00002aaab7f50000, 0x00002aab8d4b0000, 0x00002aab8d4b0000) object space 3495296K, 18% used [0x00002aaab7f50000,0x00002aaadf301a78,0x00002aab8d4b0000) PSPermGen total 21248K, used 19258K [0x00002aaaaff50000, 0x00002aaab1410000, 0x00002aaab7f50000) object space 21248K, 90% used [0x00002aaaaff50000,0x00002aaab121e8d8,0x00002aaab1410000) num #instances #bytes class name ---------------------------------------------- 1: 6459678 491568792 [C 2: 6456059 258242360 java.lang.String 3: 6282264 251290560 org.apache.lucene.index.TermInfo 4: 6282189 201030048 org.apache.lucene.index.Term 5: 70220 39109632 [I 6: 6082 25264288 [B 7: 149 20355504 [J 8: 133 20354208 [Lorg.apache.lucene.index.Term; 9: 133 20354208 [Lorg.apache.lucene.index.TermInfo; 10: 160230 8972880 java.nio.HeapByteBuffer 11: 160218 8972208 java.nio.HeapCharBuffer 12: 160210 8971760 org.apache.lucene.index.FieldsReader$FieldForMerge 13: 30440 4095480 <constMethodKlass> 14: 30440 3660128 <methodKlass> 15: 2605 3026184 <constantPoolKlass> 16: 22065 3025120 [Ljava.lang.Object; 17: 1297 2411792 [Ljava.util.HashMap$Entry; 18: 48691 2309696 <symbolKlass> 19: 2604 1981728 <instanceKlassKlass> 20: 2194 1889888 <constantPoolCacheKlass> 21: 27444 1317312 java.util.HashMap$Entry 22: 24954 998160 java.util.AbstractList$Itr 23: 18834 753360 org.apache.lucene.index.FieldInfo 24: 2846 523664 java.lang.Class 25: 13021 520840 java.util.ArrayList 26: 12471 399072 org.apache.lucene.document.Document 27: 3895 372216 [[I 28: 3904 309592 [S 29: 534 249632 <methodDataKlass> 30: 3451 220864 org.apache.lucene.index.SegmentReader$Norm 31: 1547 136136 org.apache.lucene.store.FSDirectory$FSIndexInput 32: 213 120984 <objArrayKlassKlass> 33: 737 112024 java.lang.reflect.Method 34: 1575 100800 java.lang.ref.Finalizer 35: 1345 86080 org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor 36: 1188 76032 java.util.HashMap ... On Fri, Jan 2, 2009 at 11:20 AM, Brian Whitman <br...@echonest.com> wrote: > We have a distributed setup that has been experiencing glacially slow > commit times on only some of the shards. (10s on a good shard, 263s on a > slow shard.) Each shard for this index has about 10GB of lucene index data > and the documents are segregated by an md5 hash, so the distribution of > document/data types should be equal across all shards. I've turned off our > postcommit hooks to isolate the problem, so it's not a snapshot run amok or > anything. I also moved the indexes over to new machines and the same indexes > that were slow in production are also slow on the test machines. > During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB > quad core machine. The slow commit happens every time after I add at least > one document. (If I don't add any documents the commit is immediate.) > > What can I do to look into this problem? > > > >