Upon further investigation on this issue, I see the below log lines during
the indexing process:

2019-06-06 22:24:56.203 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: trigger
flush: activeBytes=352402600 deleteBytes=279 vs limit=104857600
2019-06-06 22:24:56.203 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: thread
state has 352402600 bytes; docInRAM=1
2019-06-06 22:24:56.204 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87
x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84]
org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: 1 in-use
non-flushing threads states
2019-06-06 22:24:56.204 INFO  (qtp1169794610-5652)
[c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623
s:shard22 r:core_node87

I have the below questions:
1) The log line which says "thread state has 352402600 bytes; docInRAM=1 ",
does it mean that the buffer was flushed to disk with only one huge
document ?
2) If yes, does this flush create a segment with just one document ?
3) Heap dump analysis shows large (>350 MB) instances of
DocumentWritersPerThread. Does one instance of this class correspond to one
document?


Help is much appreciated.

Thanks,
Rahul


On Fri, Jul 5, 2019 at 2:11 AM Rahul Goswami <rahul196...@gmail.com> wrote:

> Shawn,Erick,
> Thank you for the explanation. The merge scheduler params make sense now.
>
> Thanks,
> Rahul
>
> On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Two more tidbits to add to Shawn’s explanation:
>>
>> There are heuristics built in to ConcurrentMergeScheduler.
>> From the Javadocs:
>> * If it's an SSD,
>> *  {@code maxThreadCount} is set to {@code max(1, min(4,
>> cpuCoreCount/2))},
>> *  otherwise 1.  Note that detection only currently works on
>> *  Linux; other platforms will assume the index is not on an SSD.
>>
>> Second, TieredMergePolicy (the default) merges in “tiers” that
>> are of similar size. So you can have multiple merges going on
>> at the same time on disjoint sets of segments.
>>
>> Best,
>> Erick
>>
>> > On Jul 3, 2019, at 7:54 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>> >
>> > On 7/2/2019 10:53 PM, Rahul Goswami wrote:
>> >> Hi Shawn,
>> >> Thank you for the detailed suggestions. Although, I would like to
>> >> understand the maxMergeCount and maxThreadCount params better. The
>> >> documentation
>> >> <
>> https://lucene.apache.org/solr/guide/7_3/indexconfig-in-solrconfig.html#mergescheduler
>> >
>> >> mentions
>> >> that
>> >> maxMergeCount : The maximum number of simultaneous merges that are
>> allowed.
>> >> maxThreadCount : The maximum number of simultaneous merge threads that
>> >> should be running at once
>> >> Since one thread can only do 1 merge at any given point of time, how
>> does
>> >> maxMergeCount being greater than maxThreadCount help anyway? I am
>> having
>> >> difficulty wrapping my head around this, and would appreciate if you
>> could
>> >> help clear it for me.
>> >
>> > The maxMergeCount setting controls the number of merges that can be
>> *scheduled* at the same time.  As soon as that number of merges is reached,
>> the indexing thread(s) will be paused until the number of merges in the
>> schedule drops below this number.  This ensures that no more merges will be
>> scheduled.
>> >
>> > By setting maxMergeCount higher than the number of merges that are
>> expected in the schedule, you can ensure that indexing will never be
>> paused.  It would require very atypical merge policy settings for the
>> number of scheduled merges to ever reach six.  On my own indexing, I
>> reached three scheduled merges quite frequently.  The default setting for
>> maxMergeCount is three.
>> >
>> > The maxThreadCount setting controls how many of the scheduled merges
>> will be simultaneously executed. With index data on standard spinning
>> disks, you do not want to increase this number beyond 1, or you will have a
>> performance problem due to thrashing disk heads.  If your data is on SSD,
>> you can make it larger than 1.
>> >
>> > Thanks,
>> > Shawn
>>
>>

Reply via email to