On 9/7/2017 4:25 AM, yasoobhaider wrote: > So I did a little more digging around why the merging is taking so > long, and it looks like merging postings is the culprit. On the 5.4 > version, merging 500 docs is taking approximately 100 msec, while on > the 6.6 version, it is taking more than 3000 msec. The difference > seems to get worse when more docs are being merged. Any ideas why this > may be the case?
The rest of this thread is completely lost here, I only found the info by going to Nabble, which is a mirror of the mailing list in forum format. The mailing list is the canonical repository. Setting the ramBufferSizeMB to nearly 5 gigabytes is only going to be helpful if the docs you are indexing into Solr are enormous -- many megabytes of text data in each one. Testing by Solr developers has shown that values above about 128MB do not typically provide any performance advantage with normal sized documents. The commit characteristics will have more to do with how large each segment is than the ramBufferSizeMB. The default ramBufferSizeMB value in modern Solr versions is 100. Assuming we are dealing with relatively small documents, I would recommend these settings in indexConfig (removing ramBufferSizeMB, mergePolicyFactory, and maxBufferedDocs entirely): <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>600000</maxTime> </autoSoftCommit> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">6</int> <int name="maxThreadCount">1</int> </mergeScheduler> If your data is on standard disks, then you want maxThreadCount at one. If it's on SSD, then you can raise it a little bit, but I wouldn't go beyond about 2 or 3. On standard disks with many threads writing merged segments, the disk will begin thrashing excessively and I/O will slow to a crawl. If the documents are huge, then you can raise ramBufferSizeMB, but five gigabytes is REALLY BIG and will require a very large heap. If there is good reason to increase the values in mergePolicy, then this is what I would recommend: <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">30</int> <int name="segmentsPerTier">30</int> <int name="maxMergeAtOnceExplicit">90</int> </mergePolicyFactory> The settings I've described here may help, or it may do nothing. If it doesn't help, then the problems may be memory-related, which is a whole separate discussion. When Lucene says "too many merge threads, stalling" it means there are many merges scheduled at the same time, which usually means that there are multiple *levels* of merging scheduled -- one that combines a bunch of initial level segments into one second level segment, one that combines multiple second level segments into third-level segments, and so on. The "stalling" means that the *indexing* thread is paused until the number of merges drops below maxMergeCount. If this is happening with maxMergeCount at eight, it is likely because of the current autoCommit maxDocs setting of 10000 -- each of the initial segments are very small, so there are a LOT of segments that need merging. The autoCommit and autoSoftCommit settings that I provided will hopefully make that less of a problem. Merging segments goes slower than the speed of your disks. This is because Lucene must collect a lot of information from each source segment and combine it in memory to write a new segment. The gathering and combining is much slower than modern disk speeds. Thanks, Shawn