Beautiful, thank you. -----Original Message----- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, April 28, 2017 3:07 PM To: solr-user@lucene.apache.org Subject: Re: Solr Query Performance benchmarking
I use the JMeter plugins. They’ve been reorganized recently, so they aren’t where I originally downloaded them. Try this: https://jmeter-plugins.org/wiki/RespTimePercentiles/ <https://jmeter-plugins.org/wiki/RespTimePercentiles/> https://jmeter-plugins.org/wiki/JMeterPluginsCMD/ <https://jmeter-plugins.org/wiki/JMeterPluginsCMD/> Here is the command. It processes the previous JTL output file and puts the result in test.csv. java -Xmx2g -jar CMDRunner.jar --tool Reporter --generate-csv ${prev_dir}/${test} \ --input-jtl ${prev_dir}/${out} --plugin-type ResponseTimesPercentiles \ >> $logfile 2>&1 The script prints a summary of the run. I need to fix that to also print out the header for the columns. pct25=`grep "^25.0," ${test} | cut -d , -f 2-` median=`grep "^50.0," ${test} | cut -d , -f 2-` pct75=`grep "^75.0," ${test} | cut -d , -f 2-` pct90=`grep "^90.0," ${test} | cut -d , -f 2-` pct95=`grep "^95.0," ${test} | cut -d , -f 2-` echo `date` ": 25th percentiles are $pct25" echo `date` ": medians are $median" echo `date` ": 75th percentiles are $pct75" echo `date` ": 90th percentiles are $pct90" echo `date` ": 95th percentiles are $pct95" echo `date` ": full results are in ${test}" wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2017, at 12:00 PM, Davis, Daniel (NIH/NLM) [C] > <daniel.da...@nih.gov> wrote: > > Walter, > > If you can share a pointer to that JMeter add-on, I'd love it. > > -----Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Friday, April 28, 2017 2:53 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr Query Performance benchmarking > > I use production logs to get a mix of common and long-tail queries. It is > very hard to get a realistic distribution with synthetic queries. > > A benchmark run goes like this, with a big shell script driving it. > > 1. Reload the collection to clear caches. > 2. Split the log into a cache warming set (usually the first 2000 queries) > and the rest. > 3. Run the warming set with four threads and no delay. This gets it done but > usually does not overload the server. > 4. Run the test set with hundreds of threads, each set for a particular rate. > The overall config is usually between 2000 and 10,000 requests per minute. > 5. Tests run for 1-2 hours. > 6. Grep the results for non-200 responses, filter them out, and report. > 7. Post process the results to make a CSV file of the percentile response > times, one column for each request handler. > > The benchmark driver is a headless JMeter, run with two different config > files (warming and test). The post processing is a JMeter add-on. > > If the CPU gets over about 60% or the run queue gets to about the number of > processors, the hosts are near congestion. The response time will spike if it > is pushed harder than that. > > Prod logs are usually from a few hours of peak traffic during the daytime. > This reduces the amount of bot traffic in the logs. I filter out load > balancer health checks, Zabbix checks, and so on. I like to get a log of a > million queries. That might require grabbing pen traffic logs from several > days. > > With the master/slave cluster, I use logs from a single slave. Those will > have a lower cache hit rate because the requests are randomly spread out. For > our Solr Cloud cluster, I’ve created a prod-size cluster in test. Expensive! > > There a script in the JMeter config to make /handler and /select?qt=/handler > get reported as the same thing. Thank you SolrJ. > > Our SLAs are for 95th percentile. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Apr 28, 2017, at 11:39 AM, Erick Erickson <erickerick...@gmail.com> wrote: >> >> Well, the best way to get no cache hits is to set the cache sizes to >> zero ;). That provides worst-case scenarios and tells you exactly how >> much you're relying on caches. I'm not talking the lower-level Lucene >> caches here. >> >> One thing I've done is use the TermsComponent to generate a list of >> terms actually in my corpus, and save them away "somewhere" to >> substitute into my queries. The problem with that is when you have >> anything except very simple queries involving AND, you generate >> unrealistic queries when you substitute in random values; you can be >> asking for totally unrelated terms and especially on short fields >> that leads to lots of 0-hit queries which are also unrealistic. >> >> So you get into a long cycle of generating a bunch of queries and >> removing all queries with less than N hits when you run them. Then >> generating more. Then... And each time you pick N, it introduces >> another layer of not-real-world possibly. >> >> Sometimes it's the best you can do, but if you can cull real-world >> applications it's _much_ better. Once you have a bunch (I like >> 10,000) you can be pretty confident. I not only like to run them >> randomly, but I also like to sub-divide them into N buckets and then >> run each bucket in order on the theory that that mimics what users >> actually did, they don't usually just do stuff at random. Any >> differences between the random and non-random runs can give interesting >> information. >> >> Best, >> Erick >> >> On Fri, Apr 28, 2017 at 9:38 AM, Rick Leir <rl...@leirtech.com> wrote: >>> (aside: Using Gatling or Jmeter?) >>> >>> Question: How can you easily randomize something in the query so you get no >>> cache hits? I think there are several levels of caching. >>> >>> -- >>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >