Hi Eric,

We are looking into TLOG/PULL replicas. But I have some doubts regarding
segments. Can you explain what causes creation of a new segment and how
large it can grow?
And this is my index config:
maxMergeAtOnce - 20
segmentsPerTier - 20
ramBufferSizeMB - 512 MB

Can I configure these settings optimally for low disk read during segment
merging? Like increasing segmentsPerTier may help but a large number of
segments may impact search. And as per the documentation, ramBufferSizeMB
can trigger segment merging so maybe that can be tweaked.

One more question:
This graph is representing index time wrt core size (0-100G). Commits were
happening automatically at every 100k records.

[image: image.png]

As you can see the density of spikes is increasing as the core size is
increasing. When our core size becomes ~100 G, indexing becomes really
slow. Why is this happening? Do we need to put a limit on how large each
core can grow?


On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Have you considered TLOG/PULL replicas rather than NRT replicas?
> That way, all the indexing happens on a single machine and you can
> use shards.preference to confine the searches happen on the PULL replicas,
> see:  https://lucene.apache.org/solr/guide/7_7/distributed-requests.html
>
> No, you can’t really limit the number of segments. While that seems like a
> good idea, it quickly becomes counter-productive. Say you require that you
> have 10 segments. Say each one becomes 10G. What happens when the 11th
> segment is created and it’s 100M? Do you rewrite one of the 10G segments
> just
> to add 100M? Your problem gets worse, not better.
>
>
> Best,
> Erick
>
> > On Jun 5, 2020, at 1:41 AM, Anshuman Singh <singhanshuma...@gmail.com>
> wrote:
> >
> > Hi Nicolas,
> >
> > Commit happens automatically at 100k documents. We don't commit
> explicitly.
> > We didn't limit the number of segments. There are 35+ segments in each
> core.
> > But unrelated to the question, I would like to know if we can limit the
> > number of segments in the core. I tried it in the past but the merge
> > policies don't allow that.
> > The TieredMergePolicy has two parameters, maxMergeAtOnce and
> > segmentsPerTier. It seems like we cannot control the total number of
> > segments but only the segments per tier.(
> >
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> > )
> >
> >
> > On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck <nicolas.fra...@ugent.be>
> > wrote:
> >
> >> The real questions are:
> >>
> >> * how much often do you commit (either explicitly or automatically)?
> >> * how much segments do you allow? If you only allow 1 segment,
> >>  then that whole segment is recreated using the old documents and the
> >> updates.
> >>  And yes, that requires reading the old segment.
> >>  It is common to allow multiple segments when you update often,
> >>  so updating does not interfere with reading the index too often.
> >>
> >>
> >>> On 4 Jun 2020, at 14:08, Anshuman Singh <singhanshuma...@gmail.com>
> >> wrote:
> >>>
> >>> I noticed that while indexing, when commit happens, there is high disk
> >> read
> >>> by Solr. The problem is that it is impacting search performance when
> the
> >>> index is loaded from the disk with respect to the query, as the disk
> read
> >>> speed is not quite good and the whole index is not cached in RAM.
> >>>
> >>> When no searching is performed, I noticed that disk is usually read
> >> during
> >>> commit operations and sometimes even without commit at low rate. I
> guess
> >> it
> >>> is read due to segment merge operations. Can it be something else?
> >>> If it is merging, can we limit disk IO during merging?
> >>
> >>
>
>

Reply via email to