Well, it's always possible. I wouldn't expect the search time/CPU
utilization to increase with # segments, within reasonable limits.
At some point, the important parts of the index get read into memory
and the number of segments is pretty irrelevant. You do mention
that you have a heavy ingestion pipeline, which leads me to wonder
whether you're committing too often, what are your commit
parameters?

For % deleted docs, I'm really talking about deletedDocs/numDocs.

I suppose the interesting question is whether the CPU utilization you're
seeing is _always_ correlated with # segments or are you seeing certain
machines always having the high CPU utilization. I suppose you could
issue a commit and see what difference that made.

I rather doubt that the # of segments is the underlying issue, but that's
nothing but a SWAG...

Best,
Erick




On Sat, Feb 22, 2014 at 6:16 PM, KNitin <nitin.t...@gmail.com> wrote:

> Thanks, Erick.
>
> *2> There are, but you'll have to dig. *
>
>    >> Any pointers on where to get started?
>
>
>
> *3> Well, I'd ask a counter-question. Are you seeing
> unacceptableperformance? If not, why worry? :)*
>
>    >> When you mean % do you refer to deleted_docs/NumDocs or
> deleted_docs/Max_docs ? To answer your question, yes i see some of our
> shards taking 3x more time and 3x more cpu than other shards for the same
> queries and same number of hits (all shards have exact same number of docs
> but i see a few shards having more deleted documents than the rest).
>
> My understanding is that the Search time /CPU would increase with # of
> segments ?  The core of my issue is that few nodes are running with
> extremely high CPU (90+)  and rest are running under 30% CPU and the only
> difference between both is the  # of segments in the shards on the
> machines. The nodes running hot have shards with 30 segments and the ones
> running with lesser CPU contain 20 segments and much lesser deleted
> documents.
>
> Is it possible that a difference of 10 segments could impact CPU /Search
> time?
>
> Thanks
> - Nitin
>
>
> On Sat, Feb 22, 2014 at 4:36 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > 1> It Depends. Soft commits will not add a new segment. Hard commits
> > with openSearcher=true or false _will_ create a new segment.
> > 2> There are, but you'll have to dig.
> > 3> Well, I'd ask a counter-question. Are you seeing unacceptable
> > performance? If not, why worry? :)
> >
> > A better answer is that 24-28 segments is not at all unusual.
> >
> > By and large, don't bother with optimize/force merge. What I would do is
> > look at the admin screen and note the percentage of deleted documents.
> > If it's above some arbitrary number (I typically use 15-20%) and _stays_
> > there, consider optimizing.
> >
> > However! There is a parameter you can explicitly set in solrconfig.xml
> > (sorry, which one escapes me now) that increases the "weight" of the %
> > deleted documents when the merge policy decides which segments
> > to merge. Upping this number will have the effect of more aggressively
> > merging segments with a greater % of deleted docs. But these are already
> > pretty heavily weighted for merging already...
> >
> >
> > Best,
> > Erick
> >
> >
> > On Sat, Feb 22, 2014 at 1:23 PM, KNitin <nitin.t...@gmail.com> wrote:
> >
> > > Hi
> > >
> > >   I have the following questions
> > >
> > >
> > >    1. I have a job that runs for 3-4 hours continuously committing data
> > to
> > >    a collection with auto commit of 30 seconds. Does it mean that every
> > 30
> > >    seconds I would get a new solr segment ?
> > >    2. My current segment merge policy is set to 10. Will merger always
> > >    continue running in the background to reduce the segments ? Is
> there a
> > > way
> > >    to see metrics regarding segment merging from solr (mbeans or any
> > other
> > >    way)?
> > >    3. A few of my collections are very large with around 24-28 segments
> > per
> > >    shard and around 16 shards. Is it bad to have this many segments
> for a
> > >    shard for a collection? Is it a good practice to optimize the index
> > very
> > >    often or just rely on segment merges alone?
> > >
> > >
> > >
> > > Thanks for the help in advance
> > > Nitin
> > >
> >
>

Reply via email to