On 2/17/2014 6:12 PM, Allan Carroll wrote: > I'm having trouble getting my Solr setup to get consistent performance. > Average select latency is great, but 95% is dismal (10x average). It's > probably something slightly misconfigured. I’ve seen it have nice, low > variance latencies for a few hours here and there, but can’t figure out > what’s different during those times. > > > * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes > (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about > 150 updates per second. > > * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K > docs. Stays around 30 segments. > > * Soft commits after 10 seconds, hard commits after 120 seconds. Though, > turning off the update traffic doesn’t seem to have any affect on the select > latencies. > > * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is > around 3ms per second. > > > Here’s a typical select query: > > fl=*,sortScore:textScore&sort=textScore desc&start=0&q=text:(("soccer" OR > "MLS" OR "premier league" OR "FIFA" OR "world cup") OR ("sorority" OR > "fraternity" OR "greek life" OR "dorm" OR > "campus"))&wt=json&fq=startTime:[1392656400000 TO 1392717540000]&fq={!frange > l=2 u=3}timeflag(startTime)&fq={!frange l=1392656400000 u=1392695940000 > cache=false}timefix(startTime,-21600000)&fq=privacy:OPEN&defType=edismax&rows=131
The first thing to say is that it's fairly normal for the 95th and 99th percentile values to be quite a lot higher than the median and average values. I don't have actual values so I don't know if it's bad or not. You're good on the most important performance-related resource, which is memory for the OS disk cache. The only thing that stands out as a possible problem from what I know so far is garbage collection. It might be a case of full garbage collections happening too frequently, or it might be a case of garbage collection pauses taking too long. It might even be a combination of both. To fix frequent full collections, increase the heap size. To fix the other problem, use the CMS collector and tune it. Two bits of information will help with recommendations: Your java startup options, and your solrconfig.xml. You're using an option in your query that I've never seen before. I don't know if frange is slow or not. One last thing that might cause problems is super-frequent commits. I could also be completely wrong! Thanks, Shawn