Here is an example using the Lucene benchmark package. Indexing 64,000
wikipedia docs (sorry for the formatting):

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (4
about 32 out of 256058)
     [java] Operation     round mrg  flush       runCnt  
recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_8000     0  10  32.00MB        8        
8000        37.40    1,711.22   124,612,472    182,689,792
     [java] MAddDocs_8000 -   1  10  80.00MB -  -   8 -  -  - 8000 - 
-   39.91 -  1,603.76 - 266,716,128 -  469,925,888
     [java] MAddDocs_8000     2  10 120.00MB        8        
8000        40.74    1,571.02   348,059,488    548,233,216
     [java] MAddDocs_8000 -   3  10 512.00MB -  -   8 -  -  - 8000 - 
-   38.25 -  1,673.05 - 746,087,808 -  926,089,216

After about 32-40, you don't gain much, and it starts decreasing once
you start getting to high. 8GB is a terrible recommendation.

Also, from the javadoc in IndexWriter:

   * <p> <b>NOTE</b>: because IndexWriter uses
   * <code>int</code>s when managing its internal storage,
   * the absolute maximum value for this setting is somewhat
   * less than 2048 MB.  The precise limit depends on
   * various factors, such as how large your documents are,
   * how many fields have norms, etc., so it's best to set
   * this value comfortably under 2048.</p>

Mark Miller wrote:
> 8 GB is much larger than is well supported. Its diminishing returns over
> 40-100 and mostly a waste of RAM. Too high and things can break. It
> should be well below 2 GB at most, but I'd still recommend 40-100.
>
> Fuad Efendi wrote:
>   
>> Reason of having big RAM buffer is lowering frequency of IndexWriter flushes
>> and (subsequently) lowering frequency of index merge events, and
>> (subsequently) merging of a few larger files takes less time... especially
>> if RAM Buffer is intelligent enough (and big enough) to deal with 100
>> concurrent updates of existing document without 100-times flushing to disk
>> of 100 document versions.
>>
>> I posted here thread related; I had 1:5 timing for Update:Merge (5 minutes
>> merge, and 1 minute update) with default SOLR settings (32Mb buffer). I
>> increased buffer to 8Gb on Master, and it triggered significant indexing
>> performance boost... 
>>
>> -Fuad
>> http://www.linkedin.com/in/liferay
>>
>>
>>   
>>     
>>> -----Original Message-----
>>> From: Mark Miller [mailto:markrmil...@gmail.com]
>>> Sent: October-23-09 3:03 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Too many open files
>>>
>>> I wouldn't use a RAM buffer of a gig - 32-100 is generally a good number.
>>>
>>> Fuad Efendi wrote:
>>>     
>>>       
>>>> I was partially wrong; this is what Mike McCandless  (Lucene-in-Action,
>>>>       
>>>>         
>> 2nd
>>   
>>     
>>>> edition) explained at Manning forum:
>>>>
>>>> mergeFactor of 1000 means you will have up to 1000 segments at each
>>>>       
>>>>         
>> level.
>>   
>>     
>>>> A level 0 segment means it was flushed directly by IndexWriter.
>>>> After you have 1000 such segments, they are merged into a single level 1
>>>> segment.
>>>> Once you have 1000 level 1 segments, they are merged into a single level
>>>>       
>>>>         
>> 2
>>   
>>     
>>>> segment, etc.
>>>> So, depending on how many docs you add to your index, you'll could have
>>>> 1000s of segments w/ mergeFactor=1000.
>>>>
>>>> http://www.manning-sandbox.com/thread.jspa?threadID=33784&tstart=0
>>>>
>>>>
>>>> So, in case of mergeFactor=100 you may have (theoretically) 1000
>>>>       
>>>>         
>> segments,
>>   
>>     
>>>> 10-20 files each (depending on schema)...
>>>>
>>>>
>>>> mergeFactor=10 is default setting... ramBufferSizeMB=1024 means that you
>>>> need at least double Java heap, but you have -Xmx1024m...
>>>>
>>>>
>>>> -Fuad
>>>>
>>>>
>>>>
>>>>       
>>>>         
>>>>> I am getting too many open files error.
>>>>>
>>>>> Usually I test on a server that has 4GB RAM and assigned 1GB for
>>>>> tomcat(set JAVA_OPTS=-Xms256m -Xmx1024m), ulimit -n is 256 for this
>>>>> server and has following setting for SolrConfig.xml
>>>>>
>>>>>
>>>>>
>>>>>     <useCompoundFile>true</useCompoundFile>
>>>>>
>>>>>     <ramBufferSizeMB>1024</ramBufferSizeMB>
>>>>>
>>>>>     <mergeFactor>100</mergeFactor>
>>>>>
>>>>>     <maxMergeDocs>2147483647</maxMergeDocs>
>>>>>
>>>>>     <maxFieldLength>10000</maxFieldLength>
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>>       
>>>>         
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>     
>>>       
>>
>>   
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com



Reply via email to