Hi Erick

A related question: 

Is optimize then ill advised for bulk indexer post solr 7.5 ? 
>> Especially in a situation where an index is being modified over many days ?

Thanks
Aroop

> On Mar 12, 2019, at 9:30 PM, Wei <weiwan...@gmail.com> wrote:
> 
> Thanks Erick, it's very helpful.  So for bulking indexing in a Tlog or
> Tlog/Pull cloud,  when we optimize at the end of updates, segments on the
> leader replica will change rapidly and the follower replicas will be
> continuously pulling from the leader, effectively downloading the whole
> index.  Is there a more efficient way?
> 
> On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> do _not_ turn of hard commits, even when bulk indexing. Set the
>> OpenSeacher to false in your config. This is for two reasons:
>> 1> the only time the transaction log is rolled over is when a hard commit
>> happens. If you turn off commits it’ll grow to a very large size.
>> 2> If, for any reason, the node restarts, it’ll replay the transaction log
>> from the last hard commit point, potentially taking hours if you haven’t
>> committed.
>> 
>> And you should probably open  a new searcher occasionally, even while bulk
>> indexing. For Real Time Get there are some internal structures that grow in
>> proportion to the docs indexed since the last searcher was opened.
>> 
>> And for your other quesitons:
>> <1> I believe so, try it and look at your solr log.
>> 
>> <2> Yes. Have you looked at Mike’s video (the third one down) here:
>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html?
>> TieredMergePolicy is the third video. The merge policy combines like-sized
>> segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so
>> having multiple segments < 20G is perfectly normal.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 10, 2019, at 10:36 PM, Wei <weiwan...@gmail.com> wrote:
>>> 
>>> A side question, for heavy bulk indexing, what's the recommended setting
>>> for auto commit? As there is no query needed during the bulking indexing
>>> process, I have auto soft commit disabled. Is there any side effect if I
>>> also disable auto commit?
>>> 
>>> On Sun, Mar 10, 2019 at 10:22 PM Wei <weiwan...@gmail.com> wrote:
>>> 
>>>> Thanks Erick.
>>>> 
>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should
>> optimize
>>>> on the leader then replicate the entire index to the follower.
>>>> 
>>>> Does that mean the follower will ignore the optimize request? Or shall I
>>>> send the optimize request only to one of the leaders?
>>>> 
>>>> 2> As of Solr 7.5, optimize should not optimize to a single segment
>>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
>>>> numSegments on the optimize command.
>>>> 
>>>> -- Is the 5G limit controlled by maxMegedSegmentMB setting? In
>>>> solrconfig.xml I used these settings:
>>>> 
>>>> <mergePolicyFactory
>> class="org.apache.solr.index.TieredMergePolicyFactory">
>>>>      <int name="maxMergeAtOnceExplicit">100</int>
>>>>      <int name="maxMergeAtOnce">10</int>
>>>>      <int name="segmentsPerTier">10</int>
>>>>      <double name="maxMergedSegmentMB">20480</double>
>>>> </mergePolicyFactory>
>>>> 
>>>> But in the end I see multiple segments much smaller than the 20GB limit.
>>>> In 7.6 is it required to explicitly set the number of segments to 1? e.g
>>>> shall I use
>>>> 
>>>> /update?optimize=true&waitSearcher=false&maxSegments=1
>>>> 
>>>> Best,
>>>> Wei
>>>> 
>>>> 
>>>> On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson <erickerick...@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> This is very odd for at least two reasons:
>>>>> 
>>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should
>> optimize
>>>>> on the leader then replicate the entire index to the follower.
>>>>> 
>>>>> 2> As of Solr 7.5, optimize should not optimize to a single segment
>>>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
>>>>> numSegments on the optimize command.
>>>>> 
>>>>> So if you can reliably reproduce this, it’s probably worth a JIRA…...
>>>>> 
>>>>>> On Mar 8, 2019, at 11:21 AM, Wei <weiwan...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> RecentIy I encountered a strange issue with optimize in Solr 7.6. The
>>>>> cloud
>>>>>> is created with 4 shards with 2 Tlog replicas per shard. After batch
>>>>> index
>>>>>> update I issue an optimize command to a randomly picked replica in the
>>>>>> cloud.  After a while when I check,  all the non-leader Tlog replicas
>>>>>> finished optimization to a single segment, however all the leader
>>>>> replicas
>>>>>> still have multiple segments.  Previously inn the all NRT replica
>>>>> cloud, I
>>>>>> see optimization is triggered on all nodes.  Is the optimization
>> process
>>>>>> different with Tlog/Pull replicas?
>>>>>> 
>>>>>> Best,
>>>>>> Wei
>>>>> 
>>>>> 
>> 
>> 

Reply via email to