Hmm - came out worse than it looked. Here is a better attempt: MergeFactor: 10
BUF DOCS/S 32 37.40 80 39.91 120 40.74 512 38.25 Mark Miller wrote: > Here is an example using the Lucene benchmark package. Indexing 64,000 > wikipedia docs (sorry for the formatting): > > [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 > about 32 out of 256058) > [java] Operation round mrg flush runCnt > recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem > [java] MAddDocs_8000 0 10 32.00MB 8 > 8000 37.40 1,711.22 124,612,472 182,689,792 > [java] MAddDocs_8000 - 1 10 80.00MB - - 8 - - - 8000 - > - 39.91 - 1,603.76 - 266,716,128 - 469,925,888 > [java] MAddDocs_8000 2 10 120.00MB 8 > 8000 40.74 1,571.02 348,059,488 548,233,216 > [java] MAddDocs_8000 - 3 10 512.00MB - - 8 - - - 8000 - > - 38.25 - 1,673.05 - 746,087,808 - 926,089,216 > > After about 32-40, you don't gain much, and it starts decreasing once > you start getting to high. 8GB is a terrible recommendation. > > Also, from the javadoc in IndexWriter: > > * <p> <b>NOTE</b>: because IndexWriter uses > * <code>int</code>s when managing its internal storage, > * the absolute maximum value for this setting is somewhat > * less than 2048 MB. The precise limit depends on > * various factors, such as how large your documents are, > * how many fields have norms, etc., so it's best to set > * this value comfortably under 2048.</p> > > Mark Miller wrote: > >> 8 GB is much larger than is well supported. Its diminishing returns over >> 40-100 and mostly a waste of RAM. Too high and things can break. It >> should be well below 2 GB at most, but I'd still recommend 40-100. >> >> Fuad Efendi wrote: >> >> >>> Reason of having big RAM buffer is lowering frequency of IndexWriter flushes >>> and (subsequently) lowering frequency of index merge events, and >>> (subsequently) merging of a few larger files takes less time... especially >>> if RAM Buffer is intelligent enough (and big enough) to deal with 100 >>> concurrent updates of existing document without 100-times flushing to disk >>> of 100 document versions. >>> >>> I posted here thread related; I had 1:5 timing for Update:Merge (5 minutes >>> merge, and 1 minute update) with default SOLR settings (32Mb buffer). I >>> increased buffer to 8Gb on Master, and it triggered significant indexing >>> performance boost... >>> >>> -Fuad >>> http://www.linkedin.com/in/liferay >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Mark Miller [mailto:markrmil...@gmail.com] >>>> Sent: October-23-09 3:03 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Too many open files >>>> >>>> I wouldn't use a RAM buffer of a gig - 32-100 is generally a good number. >>>> >>>> Fuad Efendi wrote: >>>> >>>> >>>> >>>>> I was partially wrong; this is what Mike McCandless (Lucene-in-Action, >>>>> >>>>> >>>>> >>> 2nd >>> >>> >>> >>>>> edition) explained at Manning forum: >>>>> >>>>> mergeFactor of 1000 means you will have up to 1000 segments at each >>>>> >>>>> >>>>> >>> level. >>> >>> >>> >>>>> A level 0 segment means it was flushed directly by IndexWriter. >>>>> After you have 1000 such segments, they are merged into a single level 1 >>>>> segment. >>>>> Once you have 1000 level 1 segments, they are merged into a single level >>>>> >>>>> >>>>> >>> 2 >>> >>> >>> >>>>> segment, etc. >>>>> So, depending on how many docs you add to your index, you'll could have >>>>> 1000s of segments w/ mergeFactor=1000. >>>>> >>>>> http://www.manning-sandbox.com/thread.jspa?threadID=33784&tstart=0 >>>>> >>>>> >>>>> So, in case of mergeFactor=100 you may have (theoretically) 1000 >>>>> >>>>> >>>>> >>> segments, >>> >>> >>> >>>>> 10-20 files each (depending on schema)... >>>>> >>>>> >>>>> mergeFactor=10 is default setting... ramBufferSizeMB=1024 means that you >>>>> need at least double Java heap, but you have -Xmx1024m... >>>>> >>>>> >>>>> -Fuad >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> I am getting too many open files error. >>>>>> >>>>>> Usually I test on a server that has 4GB RAM and assigned 1GB for >>>>>> tomcat(set JAVA_OPTS=-Xms256m -Xmx1024m), ulimit -n is 256 for this >>>>>> server and has following setting for SolrConfig.xml >>>>>> >>>>>> >>>>>> >>>>>> <useCompoundFile>true</useCompoundFile> >>>>>> >>>>>> <ramBufferSizeMB>1024</ramBufferSizeMB> >>>>>> >>>>>> <mergeFactor>100</mergeFactor> >>>>>> >>>>>> <maxMergeDocs>2147483647</maxMergeDocs> >>>>>> >>>>>> <maxFieldLength>10000</maxFieldLength> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> -- >>>> - Mark >>>> >>>> http://www.lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > -- - Mark http://www.lucidimagination.com