I should also mention that apart from committing, the pipeline also does a bunch of deletes for stale documents (based on a custom version field). The # of deletes can be very significant causing the % of deleted documents to be easily 40-50% of the index itself
Thanks KNitin On Sun, Feb 23, 2014 at 12:02 AM, KNitin <nitin.t...@gmail.com> wrote: > Commit Parameters: Server does an auto commit every 30 seconds with > open_searcher=false. The pipeline does a hard commit only at the very end > of its run > > The high CPU issue I am seeing is only during the reads and not during the > writes. Right now I see a direct corelation between latencies and # of > segments atleast for a few large collections. Will post back if the theory > is invalidated > > Thanks > - Nitin > > > On Sat, Feb 22, 2014 at 10:01 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Well, it's always possible. I wouldn't expect the search time/CPU >> utilization to increase with # segments, within reasonable limits. >> At some point, the important parts of the index get read into memory >> and the number of segments is pretty irrelevant. You do mention >> that you have a heavy ingestion pipeline, which leads me to wonder >> whether you're committing too often, what are your commit >> parameters? >> >> For % deleted docs, I'm really talking about deletedDocs/numDocs. >> >> I suppose the interesting question is whether the CPU utilization you're >> seeing is _always_ correlated with # segments or are you seeing certain >> machines always having the high CPU utilization. I suppose you could >> issue a commit and see what difference that made. >> >> I rather doubt that the # of segments is the underlying issue, but that's >> nothing but a SWAG... >> >> Best, >> Erick >> >> >> >> >> On Sat, Feb 22, 2014 at 6:16 PM, KNitin <nitin.t...@gmail.com> wrote: >> >> > Thanks, Erick. >> > >> > *2> There are, but you'll have to dig. * >> > >> > >> Any pointers on where to get started? >> > >> > >> > >> > *3> Well, I'd ask a counter-question. Are you seeing >> > unacceptableperformance? If not, why worry? :)* >> > >> > >> When you mean % do you refer to deleted_docs/NumDocs or >> > deleted_docs/Max_docs ? To answer your question, yes i see some of our >> > shards taking 3x more time and 3x more cpu than other shards for the >> same >> > queries and same number of hits (all shards have exact same number of >> docs >> > but i see a few shards having more deleted documents than the rest). >> > >> > My understanding is that the Search time /CPU would increase with # of >> > segments ? The core of my issue is that few nodes are running with >> > extremely high CPU (90+) and rest are running under 30% CPU and the >> only >> > difference between both is the # of segments in the shards on the >> > machines. The nodes running hot have shards with 30 segments and the >> ones >> > running with lesser CPU contain 20 segments and much lesser deleted >> > documents. >> > >> > Is it possible that a difference of 10 segments could impact CPU /Search >> > time? >> > >> > Thanks >> > - Nitin >> > >> > >> > On Sat, Feb 22, 2014 at 4:36 PM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > >> > > 1> It Depends. Soft commits will not add a new segment. Hard commits >> > > with openSearcher=true or false _will_ create a new segment. >> > > 2> There are, but you'll have to dig. >> > > 3> Well, I'd ask a counter-question. Are you seeing unacceptable >> > > performance? If not, why worry? :) >> > > >> > > A better answer is that 24-28 segments is not at all unusual. >> > > >> > > By and large, don't bother with optimize/force merge. What I would do >> is >> > > look at the admin screen and note the percentage of deleted documents. >> > > If it's above some arbitrary number (I typically use 15-20%) and >> _stays_ >> > > there, consider optimizing. >> > > >> > > However! There is a parameter you can explicitly set in solrconfig.xml >> > > (sorry, which one escapes me now) that increases the "weight" of the % >> > > deleted documents when the merge policy decides which segments >> > > to merge. Upping this number will have the effect of more aggressively >> > > merging segments with a greater % of deleted docs. But these are >> already >> > > pretty heavily weighted for merging already... >> > > >> > > >> > > Best, >> > > Erick >> > > >> > > >> > > On Sat, Feb 22, 2014 at 1:23 PM, KNitin <nitin.t...@gmail.com> wrote: >> > > >> > > > Hi >> > > > >> > > > I have the following questions >> > > > >> > > > >> > > > 1. I have a job that runs for 3-4 hours continuously committing >> data >> > > to >> > > > a collection with auto commit of 30 seconds. Does it mean that >> every >> > > 30 >> > > > seconds I would get a new solr segment ? >> > > > 2. My current segment merge policy is set to 10. Will merger >> always >> > > > continue running in the background to reduce the segments ? Is >> > there a >> > > > way >> > > > to see metrics regarding segment merging from solr (mbeans or any >> > > other >> > > > way)? >> > > > 3. A few of my collections are very large with around 24-28 >> segments >> > > per >> > > > shard and around 16 shards. Is it bad to have this many segments >> > for a >> > > > shard for a collection? Is it a good practice to optimize the >> index >> > > very >> > > > often or just rely on segment merges alone? >> > > > >> > > > >> > > > >> > > > Thanks for the help in advance >> > > > Nitin >> > > > >> > > >> > >> > >