Thanks Erick ! Great details as always :)

> On Mar 13, 2019, at 8:48 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Wei:
> 
> Right. You should count on the _entire_ index being replicated from the 
> leader, but only after the optimize is done. Pre 7.5, this would be a single 
> segment, 7.5+ it would be a bunch of 5G flies unless you specified that the 
> optimize create some number of segments.
> 
> But unless you
> 1> have an unreasonable number of deleted docs in your index
> or
> 2> can demonstrate improved speed after optimize (and are willing to do it 
> regularly)
> 
> I wouldn’t bother.
> 
> Aroop:
> 
> Well, optimizing is really never recommended if you can help it ;). By “help 
> it” here I mean the number of deleted documents is a “reasonable” percentage 
> of your index, where _you_ define what “reasonable” means. Another bit that 
> came along with Solr 7.5 is that the percentage of deleted documents should 
> be smaller than pre 7.5 in some cases.
> 
> It was relatively easy, for instance, to have indexes approaching 50% deleted 
> documents pre 7.5. Things had to happen “just right” for that case, but it 
> was possible.
> 
> When bulk indexing for instance, if what you’re doing is replacing all the 
> docs you should have a minuscule number of deleted docs and I wouldn’t bother.
> 
> As always, if you can demonstrate that an optimized index returns searches 
> enough faster to matter in your particular situation, then the cost may be 
> worth it. And the situation where it makes the most sense is situations where 
> you can optimize regularly.
> 
> Best,
> Erick
> 
>> On Mar 12, 2019, at 10:51 PM, Aroop Ganguly 
>> <aroop_gang...@apple.com.INVALID> wrote:
>> 
>> Hi Erick
>> 
>> A related question: 
>> 
>> Is optimize then ill advised for bulk indexer post solr 7.5 ? 
>>>> Especially in a situation where an index is being modified over many days ?
>> 
>> Thanks
>> Aroop
>> 
>>> On Mar 12, 2019, at 9:30 PM, Wei <weiwan...@gmail.com> wrote:
>>> 
>>> Thanks Erick, it's very helpful.  So for bulking indexing in a Tlog or
>>> Tlog/Pull cloud,  when we optimize at the end of updates, segments on the
>>> leader replica will change rapidly and the follower replicas will be
>>> continuously pulling from the leader, effectively downloading the whole
>>> index.  Is there a more efficient way?
>>> 
>>> On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>> 
>>>> do _not_ turn of hard commits, even when bulk indexing. Set the
>>>> OpenSeacher to false in your config. This is for two reasons:
>>>> 1> the only time the transaction log is rolled over is when a hard commit
>>>> happens. If you turn off commits it’ll grow to a very large size.
>>>> 2> If, for any reason, the node restarts, it’ll replay the transaction log
>>>> from the last hard commit point, potentially taking hours if you haven’t
>>>> committed.
>>>> 
>>>> And you should probably open  a new searcher occasionally, even while bulk
>>>> indexing. For Real Time Get there are some internal structures that grow in
>>>> proportion to the docs indexed since the last searcher was opened.
>>>> 
>>>> And for your other quesitons:
>>>> <1> I believe so, try it and look at your solr log.
>>>> 
>>>> <2> Yes. Have you looked at Mike’s video (the third one down) here:
>>>> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html?
>>>> TieredMergePolicy is the third video. The merge policy combines like-sized
>>>> segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so
>>>> having multiple segments < 20G is perfectly normal.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Mar 10, 2019, at 10:36 PM, Wei <weiwan...@gmail.com> wrote:
>>>>> 
>>>>> A side question, for heavy bulk indexing, what's the recommended setting
>>>>> for auto commit? As there is no query needed during the bulking indexing
>>>>> process, I have auto soft commit disabled. Is there any side effect if I
>>>>> also disable auto commit?
>>>>> 
>>>>> On Sun, Mar 10, 2019 at 10:22 PM Wei <weiwan...@gmail.com> wrote:
>>>>> 
>>>>>> Thanks Erick.
>>>>>> 
>>>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should
>>>> optimize
>>>>>> on the leader then replicate the entire index to the follower.
>>>>>> 
>>>>>> Does that mean the follower will ignore the optimize request? Or shall I
>>>>>> send the optimize request only to one of the leaders?
>>>>>> 
>>>>>> 2> As of Solr 7.5, optimize should not optimize to a single segment
>>>>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
>>>>>> numSegments on the optimize command.
>>>>>> 
>>>>>> -- Is the 5G limit controlled by maxMegedSegmentMB setting? In
>>>>>> solrconfig.xml I used these settings:
>>>>>> 
>>>>>> <mergePolicyFactory
>>>> class="org.apache.solr.index.TieredMergePolicyFactory">
>>>>>>    <int name="maxMergeAtOnceExplicit">100</int>
>>>>>>    <int name="maxMergeAtOnce">10</int>
>>>>>>    <int name="segmentsPerTier">10</int>
>>>>>>    <double name="maxMergedSegmentMB">20480</double>
>>>>>> </mergePolicyFactory>
>>>>>> 
>>>>>> But in the end I see multiple segments much smaller than the 20GB limit.
>>>>>> In 7.6 is it required to explicitly set the number of segments to 1? e.g
>>>>>> shall I use
>>>>>> 
>>>>>> /update?optimize=true&waitSearcher=false&maxSegments=1
>>>>>> 
>>>>>> Best,
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson <erickerick...@gmail.com
>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> This is very odd for at least two reasons:
>>>>>>> 
>>>>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should
>>>> optimize
>>>>>>> on the leader then replicate the entire index to the follower.
>>>>>>> 
>>>>>>> 2> As of Solr 7.5, optimize should not optimize to a single segment
>>>>>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set
>>>>>>> numSegments on the optimize command.
>>>>>>> 
>>>>>>> So if you can reliably reproduce this, it’s probably worth a JIRA…...
>>>>>>> 
>>>>>>>> On Mar 8, 2019, at 11:21 AM, Wei <weiwan...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> RecentIy I encountered a strange issue with optimize in Solr 7.6. The
>>>>>>> cloud
>>>>>>>> is created with 4 shards with 2 Tlog replicas per shard. After batch
>>>>>>> index
>>>>>>>> update I issue an optimize command to a randomly picked replica in the
>>>>>>>> cloud.  After a while when I check,  all the non-leader Tlog replicas
>>>>>>>> finished optimization to a single segment, however all the leader
>>>>>>> replicas
>>>>>>>> still have multiple segments.  Previously inn the all NRT replica
>>>>>>> cloud, I
>>>>>>>> see optimization is triggered on all nodes.  Is the optimization
>>>> process
>>>>>>>> different with Tlog/Pull replicas?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Wei
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>> 
> 

Reply via email to