Well, it's always possible. I wouldn't expect the search time/CPU utilization to increase with # segments, within reasonable limits. At some point, the important parts of the index get read into memory and the number of segments is pretty irrelevant. You do mention that you have a heavy ingestion pipeline, which leads me to wonder whether you're committing too often, what are your commit parameters?
For % deleted docs, I'm really talking about deletedDocs/numDocs. I suppose the interesting question is whether the CPU utilization you're seeing is _always_ correlated with # segments or are you seeing certain machines always having the high CPU utilization. I suppose you could issue a commit and see what difference that made. I rather doubt that the # of segments is the underlying issue, but that's nothing but a SWAG... Best, Erick On Sat, Feb 22, 2014 at 6:16 PM, KNitin <nitin.t...@gmail.com> wrote: > Thanks, Erick. > > *2> There are, but you'll have to dig. * > > >> Any pointers on where to get started? > > > > *3> Well, I'd ask a counter-question. Are you seeing > unacceptableperformance? If not, why worry? :)* > > >> When you mean % do you refer to deleted_docs/NumDocs or > deleted_docs/Max_docs ? To answer your question, yes i see some of our > shards taking 3x more time and 3x more cpu than other shards for the same > queries and same number of hits (all shards have exact same number of docs > but i see a few shards having more deleted documents than the rest). > > My understanding is that the Search time /CPU would increase with # of > segments ? The core of my issue is that few nodes are running with > extremely high CPU (90+) and rest are running under 30% CPU and the only > difference between both is the # of segments in the shards on the > machines. The nodes running hot have shards with 30 segments and the ones > running with lesser CPU contain 20 segments and much lesser deleted > documents. > > Is it possible that a difference of 10 segments could impact CPU /Search > time? > > Thanks > - Nitin > > > On Sat, Feb 22, 2014 at 4:36 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > 1> It Depends. Soft commits will not add a new segment. Hard commits > > with openSearcher=true or false _will_ create a new segment. > > 2> There are, but you'll have to dig. > > 3> Well, I'd ask a counter-question. Are you seeing unacceptable > > performance? If not, why worry? :) > > > > A better answer is that 24-28 segments is not at all unusual. > > > > By and large, don't bother with optimize/force merge. What I would do is > > look at the admin screen and note the percentage of deleted documents. > > If it's above some arbitrary number (I typically use 15-20%) and _stays_ > > there, consider optimizing. > > > > However! There is a parameter you can explicitly set in solrconfig.xml > > (sorry, which one escapes me now) that increases the "weight" of the % > > deleted documents when the merge policy decides which segments > > to merge. Upping this number will have the effect of more aggressively > > merging segments with a greater % of deleted docs. But these are already > > pretty heavily weighted for merging already... > > > > > > Best, > > Erick > > > > > > On Sat, Feb 22, 2014 at 1:23 PM, KNitin <nitin.t...@gmail.com> wrote: > > > > > Hi > > > > > > I have the following questions > > > > > > > > > 1. I have a job that runs for 3-4 hours continuously committing data > > to > > > a collection with auto commit of 30 seconds. Does it mean that every > > 30 > > > seconds I would get a new solr segment ? > > > 2. My current segment merge policy is set to 10. Will merger always > > > continue running in the background to reduce the segments ? Is > there a > > > way > > > to see metrics regarding segment merging from solr (mbeans or any > > other > > > way)? > > > 3. A few of my collections are very large with around 24-28 segments > > per > > > shard and around 16 shards. Is it bad to have this many segments > for a > > > shard for a collection? Is it a good practice to optimize the index > > very > > > often or just rely on segment merges alone? > > > > > > > > > > > > Thanks for the help in advance > > > Nitin > > > > > >