Upon further investigation on this issue, I see the below log lines during the indexing process:
2019-06-06 22:24:56.203 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: trigger flush: activeBytes=352402600 deleteBytes=279 vs limit=104857600 2019-06-06 22:24:56.203 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: thread state has 352402600 bytes; docInRAM=1 2019-06-06 22:24:56.204 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: 1 in-use non-flushing threads states 2019-06-06 22:24:56.204 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 I have the below questions: 1) The log line which says "thread state has 352402600 bytes; docInRAM=1 ", does it mean that the buffer was flushed to disk with only one huge document ? 2) If yes, does this flush create a segment with just one document ? 3) Heap dump analysis shows large (>350 MB) instances of DocumentWritersPerThread. Does one instance of this class correspond to one document? Help is much appreciated. Thanks, Rahul On Fri, Jul 5, 2019 at 2:11 AM Rahul Goswami <rahul196...@gmail.com> wrote: > Shawn,Erick, > Thank you for the explanation. The merge scheduler params make sense now. > > Thanks, > Rahul > > On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson <erickerick...@gmail.com> > wrote: > >> Two more tidbits to add to Shawn’s explanation: >> >> There are heuristics built in to ConcurrentMergeScheduler. >> From the Javadocs: >> * If it's an SSD, >> * {@code maxThreadCount} is set to {@code max(1, min(4, >> cpuCoreCount/2))}, >> * otherwise 1. Note that detection only currently works on >> * Linux; other platforms will assume the index is not on an SSD. >> >> Second, TieredMergePolicy (the default) merges in “tiers” that >> are of similar size. So you can have multiple merges going on >> at the same time on disjoint sets of segments. >> >> Best, >> Erick >> >> > On Jul 3, 2019, at 7:54 AM, Shawn Heisey <apa...@elyograg.org> wrote: >> > >> > On 7/2/2019 10:53 PM, Rahul Goswami wrote: >> >> Hi Shawn, >> >> Thank you for the detailed suggestions. Although, I would like to >> >> understand the maxMergeCount and maxThreadCount params better. The >> >> documentation >> >> < >> https://lucene.apache.org/solr/guide/7_3/indexconfig-in-solrconfig.html#mergescheduler >> > >> >> mentions >> >> that >> >> maxMergeCount : The maximum number of simultaneous merges that are >> allowed. >> >> maxThreadCount : The maximum number of simultaneous merge threads that >> >> should be running at once >> >> Since one thread can only do 1 merge at any given point of time, how >> does >> >> maxMergeCount being greater than maxThreadCount help anyway? I am >> having >> >> difficulty wrapping my head around this, and would appreciate if you >> could >> >> help clear it for me. >> > >> > The maxMergeCount setting controls the number of merges that can be >> *scheduled* at the same time. As soon as that number of merges is reached, >> the indexing thread(s) will be paused until the number of merges in the >> schedule drops below this number. This ensures that no more merges will be >> scheduled. >> > >> > By setting maxMergeCount higher than the number of merges that are >> expected in the schedule, you can ensure that indexing will never be >> paused. It would require very atypical merge policy settings for the >> number of scheduled merges to ever reach six. On my own indexing, I >> reached three scheduled merges quite frequently. The default setting for >> maxMergeCount is three. >> > >> > The maxThreadCount setting controls how many of the scheduled merges >> will be simultaneously executed. With index data on standard spinning >> disks, you do not want to increase this number beyond 1, or you will have a >> performance problem due to thrashing disk heads. If your data is on SSD, >> you can make it larger than 1. >> > >> > Thanks, >> > Shawn >> >>