I'm wondering if this is relevant: https://issues.apache.org/jira/browse/LUCENE-2680 - Improve how IndexWriter flushes deletes against existing segments
Roman On Fri, Oct 28, 2011 at 11:38 AM, Roman Alekseenkov <ralekseen...@gmail.com> wrote: > Hi everyone, > > I'm looking for some help with Solr indexing issues on a large scale. > > We are indexing few terabytes/month on a sizeable Solr cluster (8 > masters / serving writes, 16 slaves / serving reads). After certain > amount of tuning we got to the point where a single Solr instance can > handle index size of 100GB without much issues, but after that we are > starting to observe noticeable delays on index flush and they are > getting larger. See the attached picture for details, it's done for a > single JVM on a single machine. > > We are posting data in 8 threads using javabin format and doing commit > every 5K documents, merge factor 20, and ram buffer size about 384MB. > From the picture it can be seen that a single-threaded index flushing > code kicks in on every commit and blocks all other indexing threads. > The hardware is decent (12 physical / 24 virtual cores per machine) > and it is mostly idle when the index is flushing. Very little CPU > utilization and disk I/O (<5%), with the exception of a single CPU > core which actually does index flush (95% CPU, 5% I/O wait). > > My questions are: > > 1) will Solr changes from real-time branch help to resolve these > issues? I was reading > http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html > and it looks like we have exactly the same problem > > 2) what would be the best way to port these (and only these) changes > to 3.4.0? I tried to dig into the branching and revisions, but got > lost quickly. Tried something like "svn diff > […]realtime_search@r953476 […]realtime_search@r1097767", but I'm not > sure if it's even possible to merge these into 3.4.0 > > 3) what would you recommend for production 24/7 use? 3.4.0? > > 4) is there a workaround that can be used? also, I listed the stack trace > below > > Thank you! > Roman > > P.S. This single "index flushing" thread spends 99% of all the time in > "org.apache.lucene.index.BufferedDeletesStream.applyDeletes", and then > the merge seems to go quickly. I looked it up and it looks like the > intent here is deleting old commit points (we are keeping only 1 > non-optimized commit point per config). Not sure why is it taking that > long. > > pool-2-thread-1 [RUNNABLE] CPU time: 3:31 > java.nio.Bits.copyToByteArray(long, Object, long, long) > java.nio.DirectByteBuffer.get(byte[], int, int) > org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(byte[], int, > int) > org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) > org.apache.lucene.index.SegmentTermEnum.next() > org.apache.lucene.index.TermInfosReader.<init>(Directory, String, > FieldInfos, int, int) > org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentReader, > Directory, SegmentInfo, int, int) > org.apache.lucene.index.SegmentReader.get(boolean, Directory, > SegmentInfo, int, boolean, int) > org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo, > boolean, int, int) > org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo, boolean) > org.apache.lucene.index.BufferedDeletesStream.applyDeletes(IndexWriter$ReaderPool, > List) > org.apache.lucene.index.IndexWriter.doFlush(boolean) > org.apache.lucene.index.IndexWriter.flush(boolean, boolean) > org.apache.lucene.index.IndexWriter.closeInternal(boolean) > org.apache.lucene.index.IndexWriter.close(boolean) > org.apache.lucene.index.IndexWriter.close() > org.apache.solr.update.SolrIndexWriter.close() > org.apache.solr.update.DirectUpdateHandler2.closeWriter() > org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) > org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run() > java.util.concurrent.Executors$RunnableAdapter.call() > java.util.concurrent.FutureTask$Sync.innerRun() > java.util.concurrent.FutureTask.run() > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor$ScheduledFutureTask) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > java.util.concurrent.ThreadPoolExecutor$Worker.run() > java.lang.Thread.run() >