Hi Grant,

Looks like I temporarily solved the problem with not-so-obvious settings:
ramBufferSizeMB=8192
mergeFactor=10


Starting from scratch on a different hardware (with much more RAM and CPU;
regular SATA) I have added/updated 30 millions docs within 3 hours...
without any merge yet! Index size moved from 0 to 8Gb (5 files). I had
previously "merge" 10 times per hour, and each took about 5 minutes.


Thanks for the link; is that easy to plug MergePolicy into SOLR? I'll do
more research...


My specific "use case": many updates of documents in the index (although
only "timestamp" field changes in existing "refreshed" document)



-----Original Message-----
From: Grant Ingersoll 
Sent: August-11-09 9:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Is there a time of day you could schedule merges?  See
http://www.lucidimagination.com/search/document/bd53b0431f7eada5/concurrentm
ergescheduler_and_mergepolicy_question

Or, you might be able to implement a scheduler that only merges the  
small segments, and then does the larger ones at slow times.  I  
believe there is a Lucene issue for this that is mentioned by Shai on  
that thread above.


On Aug 11, 2009, at 5:31 PM, Fuad Efendi wrote:

> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was  
> extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed  
> process
> (suspecting that it can break my harddrive); I had about 8000 files  
> in index
> that day... 3 minutes waiting until new small *.del file appear, and  
> after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index  
> (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with  
> much
> better hardware tomorrow (2 quad-core instead of single double-core,  
> SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb)  
> but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search



Reply via email to