Here is an example using the Lucene benchmark package. Indexing 64,000 wikipedia docs (sorry for the formatting):
[java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 32 out of 256058) [java] Operation round mrg flush runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_8000 0 10 32.00MB 8 8000 37.40 1,711.22 124,612,472 182,689,792 [java] MAddDocs_8000 - 1 10 80.00MB - - 8 - - - 8000 - - 39.91 - 1,603.76 - 266,716,128 - 469,925,888 [java] MAddDocs_8000 2 10 120.00MB 8 8000 40.74 1,571.02 348,059,488 548,233,216 [java] MAddDocs_8000 - 3 10 512.00MB - - 8 - - - 8000 - - 38.25 - 1,673.05 - 746,087,808 - 926,089,216 After about 32-40, you don't gain much, and it starts decreasing once you start getting to high. 8GB is a terrible recommendation. Also, from the javadoc in IndexWriter: * <p> <b>NOTE</b>: because IndexWriter uses * <code>int</code>s when managing its internal storage, * the absolute maximum value for this setting is somewhat * less than 2048 MB. The precise limit depends on * various factors, such as how large your documents are, * how many fields have norms, etc., so it's best to set * this value comfortably under 2048.</p> Mark Miller wrote: > 8 GB is much larger than is well supported. Its diminishing returns over > 40-100 and mostly a waste of RAM. Too high and things can break. It > should be well below 2 GB at most, but I'd still recommend 40-100. > > Fuad Efendi wrote: > >> Reason of having big RAM buffer is lowering frequency of IndexWriter flushes >> and (subsequently) lowering frequency of index merge events, and >> (subsequently) merging of a few larger files takes less time... especially >> if RAM Buffer is intelligent enough (and big enough) to deal with 100 >> concurrent updates of existing document without 100-times flushing to disk >> of 100 document versions. >> >> I posted here thread related; I had 1:5 timing for Update:Merge (5 minutes >> merge, and 1 minute update) with default SOLR settings (32Mb buffer). I >> increased buffer to 8Gb on Master, and it triggered significant indexing >> performance boost... >> >> -Fuad >> http://www.linkedin.com/in/liferay >> >> >> >> >>> -----Original Message----- >>> From: Mark Miller [mailto:markrmil...@gmail.com] >>> Sent: October-23-09 3:03 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Too many open files >>> >>> I wouldn't use a RAM buffer of a gig - 32-100 is generally a good number. >>> >>> Fuad Efendi wrote: >>> >>> >>>> I was partially wrong; this is what Mike McCandless (Lucene-in-Action, >>>> >>>> >> 2nd >> >> >>>> edition) explained at Manning forum: >>>> >>>> mergeFactor of 1000 means you will have up to 1000 segments at each >>>> >>>> >> level. >> >> >>>> A level 0 segment means it was flushed directly by IndexWriter. >>>> After you have 1000 such segments, they are merged into a single level 1 >>>> segment. >>>> Once you have 1000 level 1 segments, they are merged into a single level >>>> >>>> >> 2 >> >> >>>> segment, etc. >>>> So, depending on how many docs you add to your index, you'll could have >>>> 1000s of segments w/ mergeFactor=1000. >>>> >>>> http://www.manning-sandbox.com/thread.jspa?threadID=33784&tstart=0 >>>> >>>> >>>> So, in case of mergeFactor=100 you may have (theoretically) 1000 >>>> >>>> >> segments, >> >> >>>> 10-20 files each (depending on schema)... >>>> >>>> >>>> mergeFactor=10 is default setting... ramBufferSizeMB=1024 means that you >>>> need at least double Java heap, but you have -Xmx1024m... >>>> >>>> >>>> -Fuad >>>> >>>> >>>> >>>> >>>> >>>>> I am getting too many open files error. >>>>> >>>>> Usually I test on a server that has 4GB RAM and assigned 1GB for >>>>> tomcat(set JAVA_OPTS=-Xms256m -Xmx1024m), ulimit -n is 256 for this >>>>> server and has following setting for SolrConfig.xml >>>>> >>>>> >>>>> >>>>> <useCompoundFile>true</useCompoundFile> >>>>> >>>>> <ramBufferSizeMB>1024</ramBufferSizeMB> >>>>> >>>>> <mergeFactor>100</mergeFactor> >>>>> >>>>> <maxMergeDocs>2147483647</maxMergeDocs> >>>>> >>>>> <maxFieldLength>10000</maxFieldLength> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> >>> >> >> >> > > > -- - Mark http://www.lucidimagination.com