Hmm - came out worse than it looked. Here is a better attempt:

MergeFactor: 10

BUF   DOCS/S
32       37.40
80       39.91
120     40.74
512     38.25

Mark Miller wrote:
> Here is an example using the Lucene benchmark package. Indexing 64,000
> wikipedia docs (sorry for the formatting):
>
>      [java] ------------> Report sum by Prefix (MAddDocs) and Round (4
> about 32 out of 256058)
>      [java] Operation     round mrg  flush       runCnt  
> recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
>      [java] MAddDocs_8000     0  10  32.00MB        8        
> 8000        37.40    1,711.22   124,612,472    182,689,792
>      [java] MAddDocs_8000 -   1  10  80.00MB -  -   8 -  -  - 8000 - 
> -   39.91 -  1,603.76 - 266,716,128 -  469,925,888
>      [java] MAddDocs_8000     2  10 120.00MB        8        
> 8000        40.74    1,571.02   348,059,488    548,233,216
>      [java] MAddDocs_8000 -   3  10 512.00MB -  -   8 -  -  - 8000 - 
> -   38.25 -  1,673.05 - 746,087,808 -  926,089,216
>
> After about 32-40, you don't gain much, and it starts decreasing once
> you start getting to high. 8GB is a terrible recommendation.
>
> Also, from the javadoc in IndexWriter:
>
>    * <p> <b>NOTE</b>: because IndexWriter uses
>    * <code>int</code>s when managing its internal storage,
>    * the absolute maximum value for this setting is somewhat
>    * less than 2048 MB.  The precise limit depends on
>    * various factors, such as how large your documents are,
>    * how many fields have norms, etc., so it's best to set
>    * this value comfortably under 2048.</p>
>
> Mark Miller wrote:
>   
>> 8 GB is much larger than is well supported. Its diminishing returns over
>> 40-100 and mostly a waste of RAM. Too high and things can break. It
>> should be well below 2 GB at most, but I'd still recommend 40-100.
>>
>> Fuad Efendi wrote:
>>   
>>     
>>> Reason of having big RAM buffer is lowering frequency of IndexWriter flushes
>>> and (subsequently) lowering frequency of index merge events, and
>>> (subsequently) merging of a few larger files takes less time... especially
>>> if RAM Buffer is intelligent enough (and big enough) to deal with 100
>>> concurrent updates of existing document without 100-times flushing to disk
>>> of 100 document versions.
>>>
>>> I posted here thread related; I had 1:5 timing for Update:Merge (5 minutes
>>> merge, and 1 minute update) with default SOLR settings (32Mb buffer). I
>>> increased buffer to 8Gb on Master, and it triggered significant indexing
>>> performance boost... 
>>>
>>> -Fuad
>>> http://www.linkedin.com/in/liferay
>>>
>>>
>>>   
>>>     
>>>       
>>>> -----Original Message-----
>>>> From: Mark Miller [mailto:markrmil...@gmail.com]
>>>> Sent: October-23-09 3:03 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Too many open files
>>>>
>>>> I wouldn't use a RAM buffer of a gig - 32-100 is generally a good number.
>>>>
>>>> Fuad Efendi wrote:
>>>>     
>>>>       
>>>>         
>>>>> I was partially wrong; this is what Mike McCandless  (Lucene-in-Action,
>>>>>       
>>>>>         
>>>>>           
>>> 2nd
>>>   
>>>     
>>>       
>>>>> edition) explained at Manning forum:
>>>>>
>>>>> mergeFactor of 1000 means you will have up to 1000 segments at each
>>>>>       
>>>>>         
>>>>>           
>>> level.
>>>   
>>>     
>>>       
>>>>> A level 0 segment means it was flushed directly by IndexWriter.
>>>>> After you have 1000 such segments, they are merged into a single level 1
>>>>> segment.
>>>>> Once you have 1000 level 1 segments, they are merged into a single level
>>>>>       
>>>>>         
>>>>>           
>>> 2
>>>   
>>>     
>>>       
>>>>> segment, etc.
>>>>> So, depending on how many docs you add to your index, you'll could have
>>>>> 1000s of segments w/ mergeFactor=1000.
>>>>>
>>>>> http://www.manning-sandbox.com/thread.jspa?threadID=33784&tstart=0
>>>>>
>>>>>
>>>>> So, in case of mergeFactor=100 you may have (theoretically) 1000
>>>>>       
>>>>>         
>>>>>           
>>> segments,
>>>   
>>>     
>>>       
>>>>> 10-20 files each (depending on schema)...
>>>>>
>>>>>
>>>>> mergeFactor=10 is default setting... ramBufferSizeMB=1024 means that you
>>>>> need at least double Java heap, but you have -Xmx1024m...
>>>>>
>>>>>
>>>>> -Fuad
>>>>>
>>>>>
>>>>>
>>>>>       
>>>>>         
>>>>>           
>>>>>> I am getting too many open files error.
>>>>>>
>>>>>> Usually I test on a server that has 4GB RAM and assigned 1GB for
>>>>>> tomcat(set JAVA_OPTS=-Xms256m -Xmx1024m), ulimit -n is 256 for this
>>>>>> server and has following setting for SolrConfig.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>>     <useCompoundFile>true</useCompoundFile>
>>>>>>
>>>>>>     <ramBufferSizeMB>1024</ramBufferSizeMB>
>>>>>>
>>>>>>     <mergeFactor>100</mergeFactor>
>>>>>>
>>>>>>     <maxMergeDocs>2147483647</maxMergeDocs>
>>>>>>
>>>>>>     <maxFieldLength>10000</maxFieldLength>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>       
>>>>>         
>>>>>           
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com



Reply via email to