Hi Erick A related question:
Is optimize then ill advised for bulk indexer post solr 7.5 ? >> Especially in a situation where an index is being modified over many days ? Thanks Aroop > On Mar 12, 2019, at 9:30 PM, Wei <weiwan...@gmail.com> wrote: > > Thanks Erick, it's very helpful. So for bulking indexing in a Tlog or > Tlog/Pull cloud, when we optimize at the end of updates, segments on the > leader replica will change rapidly and the follower replicas will be > continuously pulling from the leader, effectively downloading the whole > index. Is there a more efficient way? > > On Mon, Mar 11, 2019 at 9:59 AM Erick Erickson <erickerick...@gmail.com> > wrote: > >> do _not_ turn of hard commits, even when bulk indexing. Set the >> OpenSeacher to false in your config. This is for two reasons: >> 1> the only time the transaction log is rolled over is when a hard commit >> happens. If you turn off commits it’ll grow to a very large size. >> 2> If, for any reason, the node restarts, it’ll replay the transaction log >> from the last hard commit point, potentially taking hours if you haven’t >> committed. >> >> And you should probably open a new searcher occasionally, even while bulk >> indexing. For Real Time Get there are some internal structures that grow in >> proportion to the docs indexed since the last searcher was opened. >> >> And for your other quesitons: >> <1> I believe so, try it and look at your solr log. >> >> <2> Yes. Have you looked at Mike’s video (the third one down) here: >> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html? >> TieredMergePolicy is the third video. The merge policy combines like-sized >> segments. It’s wasteful to rewrite, say, a 19G segment just to add a 1G so >> having multiple segments < 20G is perfectly normal. >> >> Best, >> Erick >> >>> On Mar 10, 2019, at 10:36 PM, Wei <weiwan...@gmail.com> wrote: >>> >>> A side question, for heavy bulk indexing, what's the recommended setting >>> for auto commit? As there is no query needed during the bulking indexing >>> process, I have auto soft commit disabled. Is there any side effect if I >>> also disable auto commit? >>> >>> On Sun, Mar 10, 2019 at 10:22 PM Wei <weiwan...@gmail.com> wrote: >>> >>>> Thanks Erick. >>>> >>>> 1> TLOG replicas shouldn’t optimize on the follower. They should >> optimize >>>> on the leader then replicate the entire index to the follower. >>>> >>>> Does that mean the follower will ignore the optimize request? Or shall I >>>> send the optimize request only to one of the leaders? >>>> >>>> 2> As of Solr 7.5, optimize should not optimize to a single segment >>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set >>>> numSegments on the optimize command. >>>> >>>> -- Is the 5G limit controlled by maxMegedSegmentMB setting? In >>>> solrconfig.xml I used these settings: >>>> >>>> <mergePolicyFactory >> class="org.apache.solr.index.TieredMergePolicyFactory"> >>>> <int name="maxMergeAtOnceExplicit">100</int> >>>> <int name="maxMergeAtOnce">10</int> >>>> <int name="segmentsPerTier">10</int> >>>> <double name="maxMergedSegmentMB">20480</double> >>>> </mergePolicyFactory> >>>> >>>> But in the end I see multiple segments much smaller than the 20GB limit. >>>> In 7.6 is it required to explicitly set the number of segments to 1? e.g >>>> shall I use >>>> >>>> /update?optimize=true&waitSearcher=false&maxSegments=1 >>>> >>>> Best, >>>> Wei >>>> >>>> >>>> On Fri, Mar 8, 2019 at 12:29 PM Erick Erickson <erickerick...@gmail.com >>> >>>> wrote: >>>> >>>>> This is very odd for at least two reasons: >>>>> >>>>> 1> TLOG replicas shouldn’t optimize on the follower. They should >> optimize >>>>> on the leader then replicate the entire index to the follower. >>>>> >>>>> 2> As of Solr 7.5, optimize should not optimize to a single segment >>>>> _unless_ that segment is < 5G. See LUCENE-7976. Or you explicitly set >>>>> numSegments on the optimize command. >>>>> >>>>> So if you can reliably reproduce this, it’s probably worth a JIRA…... >>>>> >>>>>> On Mar 8, 2019, at 11:21 AM, Wei <weiwan...@gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> RecentIy I encountered a strange issue with optimize in Solr 7.6. The >>>>> cloud >>>>>> is created with 4 shards with 2 Tlog replicas per shard. After batch >>>>> index >>>>>> update I issue an optimize command to a randomly picked replica in the >>>>>> cloud. After a while when I check, all the non-leader Tlog replicas >>>>>> finished optimization to a single segment, however all the leader >>>>> replicas >>>>>> still have multiple segments. Previously inn the all NRT replica >>>>> cloud, I >>>>>> see optimization is triggered on all nodes. Is the optimization >> process >>>>>> different with Tlog/Pull replicas? >>>>>> >>>>>> Best, >>>>>> Wei >>>>> >>>>> >> >>