large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Roman Alekseenkov
Hi everyone,

I'm looking for some help with Solr indexing issues on a large scale.

We are indexing few terabytes/month on a sizeable Solr cluster (8
masters / serving writes, 16 slaves / serving reads). After certain
amount of tuning we got to the point where a single Solr instance can
handle index size of 100GB without much issues, but after that we are
starting to observe noticeable delays on index flush and they are
getting larger. See the attached picture for details, it's done for a
single JVM on a single machine.

We are posting data in 8 threads using javabin format and doing commit
every 5K documents, merge factor 20, and ram buffer size about 384MB.
>From the picture it can be seen that a single-threaded index flushing
code kicks in on every commit and blocks all other indexing threads.
The hardware is decent (12 physical / 24 virtual cores per machine)
and it is mostly idle when the index is flushing. Very little CPU
utilization and disk I/O (<5%), with the exception of a single CPU
core which actually does index flush (95% CPU, 5% I/O wait).

My questions are:

1) will Solr changes from real-time branch help to resolve these
issues? I was reading
http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html
and it looks like we have exactly the same problem

2) what would be the best way to port these (and only these) changes
to 3.4.0? I tried to dig into the branching and revisions, but got
lost quickly. Tried something like "svn diff
[…]realtime_search@r953476 […]realtime_search@r1097767", but I'm not
sure if it's even possible to merge these into 3.4.0

3) what would you recommend for production 24/7 use? 3.4.0?

4) is there a workaround that can be used? also, I listed the stack trace below

Thank you!
Roman

P.S. This single "index flushing" thread spends 99% of all the time in
"org.apache.lucene.index.BufferedDeletesStream.applyDeletes", and then
the merge seems to go quickly. I looked it up and it looks like the
intent here is deleting old commit points (we are keeping only 1
non-optimized commit point per config). Not sure why is it taking that
long.

pool-2-thread-1 [RUNNABLE] CPU time: 3:31
java.nio.Bits.copyToByteArray(long, Object, long, long)
java.nio.DirectByteBuffer.get(byte[], int, int)
org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(byte[], int, int)
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.TermInfosReader.(Directory, String,
FieldInfos, int, int)
org.apache.lucene.index.SegmentCoreReaders.(SegmentReader,
Directory, SegmentInfo, int, int)
org.apache.lucene.index.SegmentReader.get(boolean, Directory,
SegmentInfo, int, boolean, int)
org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo,
boolean, int, int)
org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo, boolean)
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(IndexWriter$ReaderPool,
List)
org.apache.lucene.index.IndexWriter.doFlush(boolean)
org.apache.lucene.index.IndexWriter.flush(boolean, boolean)
org.apache.lucene.index.IndexWriter.closeInternal(boolean)
org.apache.lucene.index.IndexWriter.close(boolean)
org.apache.lucene.index.IndexWriter.close()
org.apache.solr.update.SolrIndexWriter.close()
org.apache.solr.update.DirectUpdateHandler2.closeWriter()
org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run()
java.util.concurrent.Executors$RunnableAdapter.call()
java.util.concurrent.FutureTask$Sync.innerRun()
java.util.concurrent.FutureTask.run()
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor$ScheduledFutureTask)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
java.util.concurrent.ThreadPoolExecutor$Worker.run()
java.lang.Thread.run()


Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Roman Alekseenkov
I'm wondering if this is relevant:
https://issues.apache.org/jira/browse/LUCENE-2680 - Improve how
IndexWriter flushes deletes against existing segments

Roman

On Fri, Oct 28, 2011 at 11:38 AM, Roman Alekseenkov
 wrote:
> Hi everyone,
>
> I'm looking for some help with Solr indexing issues on a large scale.
>
> We are indexing few terabytes/month on a sizeable Solr cluster (8
> masters / serving writes, 16 slaves / serving reads). After certain
> amount of tuning we got to the point where a single Solr instance can
> handle index size of 100GB without much issues, but after that we are
> starting to observe noticeable delays on index flush and they are
> getting larger. See the attached picture for details, it's done for a
> single JVM on a single machine.
>
> We are posting data in 8 threads using javabin format and doing commit
> every 5K documents, merge factor 20, and ram buffer size about 384MB.
> From the picture it can be seen that a single-threaded index flushing
> code kicks in on every commit and blocks all other indexing threads.
> The hardware is decent (12 physical / 24 virtual cores per machine)
> and it is mostly idle when the index is flushing. Very little CPU
> utilization and disk I/O (<5%), with the exception of a single CPU
> core which actually does index flush (95% CPU, 5% I/O wait).
>
> My questions are:
>
> 1) will Solr changes from real-time branch help to resolve these
> issues? I was reading
> http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html
> and it looks like we have exactly the same problem
>
> 2) what would be the best way to port these (and only these) changes
> to 3.4.0? I tried to dig into the branching and revisions, but got
> lost quickly. Tried something like "svn diff
> […]realtime_search@r953476 […]realtime_search@r1097767", but I'm not
> sure if it's even possible to merge these into 3.4.0
>
> 3) what would you recommend for production 24/7 use? 3.4.0?
>
> 4) is there a workaround that can be used? also, I listed the stack trace 
> below
>
> Thank you!
> Roman
>
> P.S. This single "index flushing" thread spends 99% of all the time in
> "org.apache.lucene.index.BufferedDeletesStream.applyDeletes", and then
> the merge seems to go quickly. I looked it up and it looks like the
> intent here is deleting old commit points (we are keeping only 1
> non-optimized commit point per config). Not sure why is it taking that
> long.
>
> pool-2-thread-1 [RUNNABLE] CPU time: 3:31
> java.nio.Bits.copyToByteArray(long, Object, long, long)
> java.nio.DirectByteBuffer.get(byte[], int, int)
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(byte[], int, 
> int)
> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
> org.apache.lucene.index.SegmentTermEnum.next()
> org.apache.lucene.index.TermInfosReader.(Directory, String,
> FieldInfos, int, int)
> org.apache.lucene.index.SegmentCoreReaders.(SegmentReader,
> Directory, SegmentInfo, int, int)
> org.apache.lucene.index.SegmentReader.get(boolean, Directory,
> SegmentInfo, int, boolean, int)
> org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo,
> boolean, int, int)
> org.apache.lucene.index.IndexWriter$ReaderPool.get(SegmentInfo, boolean)
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(IndexWriter$ReaderPool,
> List)
> org.apache.lucene.index.IndexWriter.doFlush(boolean)
> org.apache.lucene.index.IndexWriter.flush(boolean, boolean)
> org.apache.lucene.index.IndexWriter.closeInternal(boolean)
> org.apache.lucene.index.IndexWriter.close(boolean)
> org.apache.lucene.index.IndexWriter.close()
> org.apache.solr.update.SolrIndexWriter.close()
> org.apache.solr.update.DirectUpdateHandler2.closeWriter()
> org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
> org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run()
> java.util.concurrent.Executors$RunnableAdapter.call()
> java.util.concurrent.FutureTask$Sync.innerRun()
> java.util.concurrent.FutureTask.run()
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor$ScheduledFutureTask)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> java.util.concurrent.ThreadPoolExecutor$Worker.run()
> java.lang.Thread.run()
>


Re: large scale indexing issues / single threaded bottleneck

2011-10-30 Thread Roman Alekseenkov
Guys, thank you for all the replies.

I think I have figured out a partial solution for the problem on Friday
night. Adding a whole bunch of debug statements to the info stream showed
that every document is following "update document" path instead of "add
document" path. Meaning that all document IDs are getting into the "pending
deletes" queue, and Solr has to rescan its index on every commit for
potential deletions. This is single threaded and seems to get progressively
slower with the index size.

Adding overwrite=false to the URL in /update handler did NOT help, as my
debug statements showed that messages still go to updateDocument() function
with deleteTerm not being null. So, I hacked Lucene a little bit and set
deleteTerm=null as a temporary solution in the beginning of
updateDocument(), and it does not call applyDeletes() anymore. 

This gave a 6-8x performance boost, and now we can index about 9 million
documents/hour (producing 20Gb of index every hour). Right now it's at 1TB
index size and going, without noticeable degradation of the indexing speed.
This is decent, but still the 24-core machine is barely utilized :)

Now I think it's hitting a merge bottleneck, where all indexing threads are
being paused. And ConcurrentMergeScheduler with 4 threads is not helping
much. I guess the changes on the trunk would definitely help, but we will
likely stay on 3.4

Will dig more into the issue on Monday. Really curious to see why
"overwrite=false" didn't help, but the hack did.

Once again, thank you for the answers and recommendations

Roman



--
View this message in context: 
http://lucene.472066.n3.nabble.com/large-scale-indexing-issues-single-threaded-bottleneck-tp3461815p3466523.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: large scale indexing issues / single threaded bottleneck

2011-11-01 Thread Roman Alekseenkov
We have a rate of 2K small docs/sec which translates into 90 GB/day of index
space
You should be fine

Roman


Awasthi, Shishir wrote:
> 
> Roman,
> How frequently do you update your index? I have a need to do real time
> add/delete to SOLR documents at a rate of approximately 20/min.
> The total number of documents are in the range of 4 million. Will there
> be any performance issues?
> 
> Thanks,
> Shishir
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/large-scale-indexing-issues-single-threaded-bottleneck-tp3461815p3472901.html
Sent from the Solr - User mailing list archive at Nabble.com.