Hi Eric, We are looking into TLOG/PULL replicas. But I have some doubts regarding segments. Can you explain what causes creation of a new segment and how large it can grow? And this is my index config: maxMergeAtOnce - 20 segmentsPerTier - 20 ramBufferSizeMB - 512 MB
Can I configure these settings optimally for low disk read during segment merging? Like increasing segmentsPerTier may help but a large number of segments may impact search. And as per the documentation, ramBufferSizeMB can trigger segment merging so maybe that can be tweaked. One more question: This graph is representing index time wrt core size (0-100G). Commits were happening automatically at every 100k records. [image: image.png] As you can see the density of spikes is increasing as the core size is increasing. When our core size becomes ~100 G, indexing becomes really slow. Why is this happening? Do we need to put a limit on how large each core can grow? On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson <erickerick...@gmail.com> wrote: > Have you considered TLOG/PULL replicas rather than NRT replicas? > That way, all the indexing happens on a single machine and you can > use shards.preference to confine the searches happen on the PULL replicas, > see: https://lucene.apache.org/solr/guide/7_7/distributed-requests.html > > No, you can’t really limit the number of segments. While that seems like a > good idea, it quickly becomes counter-productive. Say you require that you > have 10 segments. Say each one becomes 10G. What happens when the 11th > segment is created and it’s 100M? Do you rewrite one of the 10G segments > just > to add 100M? Your problem gets worse, not better. > > > Best, > Erick > > > On Jun 5, 2020, at 1:41 AM, Anshuman Singh <singhanshuma...@gmail.com> > wrote: > > > > Hi Nicolas, > > > > Commit happens automatically at 100k documents. We don't commit > explicitly. > > We didn't limit the number of segments. There are 35+ segments in each > core. > > But unrelated to the question, I would like to know if we can limit the > > number of segments in the core. I tried it in the past but the merge > > policies don't allow that. > > The TieredMergePolicy has two parameters, maxMergeAtOnce and > > segmentsPerTier. It seems like we cannot control the total number of > > segments but only the segments per tier.( > > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > > ) > > > > > > On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck <nicolas.fra...@ugent.be> > > wrote: > > > >> The real questions are: > >> > >> * how much often do you commit (either explicitly or automatically)? > >> * how much segments do you allow? If you only allow 1 segment, > >> then that whole segment is recreated using the old documents and the > >> updates. > >> And yes, that requires reading the old segment. > >> It is common to allow multiple segments when you update often, > >> so updating does not interfere with reading the index too often. > >> > >> > >>> On 4 Jun 2020, at 14:08, Anshuman Singh <singhanshuma...@gmail.com> > >> wrote: > >>> > >>> I noticed that while indexing, when commit happens, there is high disk > >> read > >>> by Solr. The problem is that it is impacting search performance when > the > >>> index is loaded from the disk with respect to the query, as the disk > read > >>> speed is not quite good and the whole index is not cached in RAM. > >>> > >>> When no searching is performed, I noticed that disk is usually read > >> during > >>> commit operations and sometimes even without commit at low rate. I > guess > >> it > >>> is read due to segment merge operations. Can it be something else? > >>> If it is merging, can we limit disk IO during merging? > >> > >> > >