I should also mention that apart from committing, the pipeline also does a
bunch of deletes for stale documents (based on a custom version field). The
# of deletes can be very significant causing the % of deleted documents to
be easily 40-50% of the index itself

Thanks
KNitin


On Sun, Feb 23, 2014 at 12:02 AM, KNitin <nitin.t...@gmail.com> wrote:

> Commit Parameters: Server does an auto commit every 30 seconds with
> open_searcher=false. The pipeline does a hard commit only at the very end
> of its run
>
> The high CPU issue I am seeing is only during the reads and not during the
> writes. Right now I see a direct corelation between latencies and # of
> segments atleast for a few large collections. Will post back if the theory
> is invalidated
>
> Thanks
> - Nitin
>
>
> On Sat, Feb 22, 2014 at 10:01 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Well, it's always possible. I wouldn't expect the search time/CPU
>> utilization to increase with # segments, within reasonable limits.
>> At some point, the important parts of the index get read into memory
>> and the number of segments is pretty irrelevant. You do mention
>> that you have a heavy ingestion pipeline, which leads me to wonder
>> whether you're committing too often, what are your commit
>> parameters?
>>
>> For % deleted docs, I'm really talking about deletedDocs/numDocs.
>>
>> I suppose the interesting question is whether the CPU utilization you're
>> seeing is _always_ correlated with # segments or are you seeing certain
>> machines always having the high CPU utilization. I suppose you could
>> issue a commit and see what difference that made.
>>
>> I rather doubt that the # of segments is the underlying issue, but that's
>> nothing but a SWAG...
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> On Sat, Feb 22, 2014 at 6:16 PM, KNitin <nitin.t...@gmail.com> wrote:
>>
>> > Thanks, Erick.
>> >
>> > *2> There are, but you'll have to dig. *
>> >
>> >    >> Any pointers on where to get started?
>> >
>> >
>> >
>> > *3> Well, I'd ask a counter-question. Are you seeing
>> > unacceptableperformance? If not, why worry? :)*
>> >
>> >    >> When you mean % do you refer to deleted_docs/NumDocs or
>> > deleted_docs/Max_docs ? To answer your question, yes i see some of our
>> > shards taking 3x more time and 3x more cpu than other shards for the
>> same
>> > queries and same number of hits (all shards have exact same number of
>> docs
>> > but i see a few shards having more deleted documents than the rest).
>> >
>> > My understanding is that the Search time /CPU would increase with # of
>> > segments ?  The core of my issue is that few nodes are running with
>> > extremely high CPU (90+)  and rest are running under 30% CPU and the
>> only
>> > difference between both is the  # of segments in the shards on the
>> > machines. The nodes running hot have shards with 30 segments and the
>> ones
>> > running with lesser CPU contain 20 segments and much lesser deleted
>> > documents.
>> >
>> > Is it possible that a difference of 10 segments could impact CPU /Search
>> > time?
>> >
>> > Thanks
>> > - Nitin
>> >
>> >
>> > On Sat, Feb 22, 2014 at 4:36 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> >
>> > > 1> It Depends. Soft commits will not add a new segment. Hard commits
>> > > with openSearcher=true or false _will_ create a new segment.
>> > > 2> There are, but you'll have to dig.
>> > > 3> Well, I'd ask a counter-question. Are you seeing unacceptable
>> > > performance? If not, why worry? :)
>> > >
>> > > A better answer is that 24-28 segments is not at all unusual.
>> > >
>> > > By and large, don't bother with optimize/force merge. What I would do
>> is
>> > > look at the admin screen and note the percentage of deleted documents.
>> > > If it's above some arbitrary number (I typically use 15-20%) and
>> _stays_
>> > > there, consider optimizing.
>> > >
>> > > However! There is a parameter you can explicitly set in solrconfig.xml
>> > > (sorry, which one escapes me now) that increases the "weight" of the %
>> > > deleted documents when the merge policy decides which segments
>> > > to merge. Upping this number will have the effect of more aggressively
>> > > merging segments with a greater % of deleted docs. But these are
>> already
>> > > pretty heavily weighted for merging already...
>> > >
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > >
>> > > On Sat, Feb 22, 2014 at 1:23 PM, KNitin <nitin.t...@gmail.com> wrote:
>> > >
>> > > > Hi
>> > > >
>> > > >   I have the following questions
>> > > >
>> > > >
>> > > >    1. I have a job that runs for 3-4 hours continuously committing
>> data
>> > > to
>> > > >    a collection with auto commit of 30 seconds. Does it mean that
>> every
>> > > 30
>> > > >    seconds I would get a new solr segment ?
>> > > >    2. My current segment merge policy is set to 10. Will merger
>> always
>> > > >    continue running in the background to reduce the segments ? Is
>> > there a
>> > > > way
>> > > >    to see metrics regarding segment merging from solr (mbeans or any
>> > > other
>> > > >    way)?
>> > > >    3. A few of my collections are very large with around 24-28
>> segments
>> > > per
>> > > >    shard and around 16 shards. Is it bad to have this many segments
>> > for a
>> > > >    shard for a collection? Is it a good practice to optimize the
>> index
>> > > very
>> > > >    often or just rely on segment merges alone?
>> > > >
>> > > >
>> > > >
>> > > > Thanks for the help in advance
>> > > > Nitin
>> > > >
>> > >
>> >
>>
>
>

Reply via email to