Thanks for the suggestions. I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no full CMS collections. Anything other places to look for GC activity I might be missing?
I did a little investigation this morning and found that if I run a query once a second, every 10th query is slow. Looks suspiciously like the soft commits are causing the slow downs. I could make it further in between. Anything else I can look at to make those commits less costly? Here are the java options: -server -XX:+AggressiveOpts -XX:+UseCompressedOops -Xmx3G -Xms3G -Xss256k -XX:MaxPermSize=128m -XX:PermSize=96m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=6 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Xloggc:/var/log/tomcat7/gc-tomcat.log -verbose:gc -XX:GCLogFileSize=10M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram -XX:+PrintTenuringDistribution -XX:-PrintGCApplicationStoppedTime -DzkHost=xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181/solr -Dcom.sun.management.jmxremote -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/share/tomcat7/endorsed I’m using tomcat, though I’ve heard that jetty can be a better choice. I’ve also attached my solrconfig. -Allan On February 17, 2014 at 6:06:03 PM, Shawn Heisey (s...@elyograg.org) wrote: On 2/17/2014 6:12 PM, Allan Carroll wrote: > I'm having trouble getting my Solr setup to get consistent performance. > Average select latency is great, but 95% is dismal (10x average). It's > probably something slightly misconfigured. I’ve seen it have nice, low > variance latencies for a few hours here and there, but can’t figure out > what’s different during those times. > > > * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes > (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about > 150 updates per second. > > * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K > docs. Stays around 30 segments. > > * Soft commits after 10 seconds, hard commits after 120 seconds. Though, > turning off the update traffic doesn’t seem to have any affect on the select > latencies. > > * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is > around 3ms per second. > > > Here’s a typical select query: > > fl=*,sortScore:textScore&sort=textScore desc&start=0&q=text:(("soccer" OR > "MLS" OR "premier league" OR "FIFA" OR "world cup") OR ("sorority" OR > "fraternity" OR "greek life" OR "dorm" OR > "campus"))&wt=json&fq=startTime:[1392656400000 TO 1392717540000]&fq={!frange > l=2 u=3}timeflag(startTime)&fq={!frange l=1392656400000 u=1392695940000 > cache=false}timefix(startTime,-21600000)&fq=privacy:OPEN&defType=edismax&rows=131 > The first thing to say is that it's fairly normal for the 95th and 99th percentile values to be quite a lot higher than the median and average values. I don't have actual values so I don't know if it's bad or not. You're good on the most important performance-related resource, which is memory for the OS disk cache. The only thing that stands out as a possible problem from what I know so far is garbage collection. It might be a case of full garbage collections happening too frequently, or it might be a case of garbage collection pauses taking too long. It might even be a combination of both. To fix frequent full collections, increase the heap size. To fix the other problem, use the CMS collector and tune it. Two bits of information will help with recommendations: Your java startup options, and your solrconfig.xml. You're using an option in your query that I've never seen before. I don't know if frange is slow or not. One last thing that might cause problems is super-frequent commits. I could also be completely wrong! Thanks, Shawn