We do two kinds of load testing for our Solr search farm. 1. Zipf distribution, using a log of user queries. This tests the engine without any caching in front.
2. Flat distribution, using unique queries (sort|uniq on the above log). This is a worst case load with a perfect cache in front. In both cases, we use multiple threads and we start each thread at a different place in the list. Right now, the middle tier HTTP cache is an optional part of our search farm, so we test performance with that disabled and enabled. We see a 75% hit rate on that cache, so our target rate for Solr in the unique query case is 4X lower than the Zipf distribution case. Also, use a lot of queries, at least 100K. wunder On 8/21/08 2:34 PM, "Phillip Farber" <[EMAIL PROTECTED]> wrote: > > On rereading my original post it does sound weird. Let me try again and > thanks for bearing with me. > > I want to know how long solr will take to process a unique query taking > full advantage of OS i/o buffers. I think executing a set of unique > queries from a cold start should measure that if I knew that I was > making maximal use of OS i/o buffering. That is, that the only variable > in my test was the amount of memory devoted to buffering index segments. > Is that basically correct? > > As I scale up shard size, I just want to know that response time is > staying within bounds as I add documents. When response time falls out > of bounds, I'll add another shard but I want to know when to do that. > > So I guess I'm asking how to make sure that solr is making best use of > OS i/o buffers on a dedicated server. > > Phil > > > Otis Gospodnetic wrote: >> Hi, >> >> I think you are describing some "weird" unrealistic scenarios. >> There is typically no need to test "just solr" without relying on disk >> caches. Not using disk buffers will only work in trivial scenarios, but if >> you really want to test it, run something that hogs memory while running solr >> perf tests on the same server. >> >> You often can't prewarm caches at the very beginning because you can't >> predict queries (not always true), so yes, initially caches will be empty, >> but then they will get filled and then you will (want to) use them. I don't >> think there is a way to clear Solr caches. >> >> I can't give more advice at this time, I don't fully understand what you are >> trying to test... >> >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> ----- Original Message ---- >>> From: Phillip Farber <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Wednesday, August 20, 2008 1:34:20 PM >>> Subject: Testing query response time >>> >>> >>> >>> I would like to test query response time for a set of queries. I'm not >>> interested in capacity Q/sec, just response time. My queries will be >>> against an index of OCR'd books so in the real world every query is >>> probably unique and impossible to predict so I don't see a way to >>> prewarm any of the caches. I'm not sorting. I'm not faceting. I'm >>> querying on a few fields like title, author, subject and date in a range. >>> >>> Regarding initial conditions, it seems that there's no useful state into >>> which I can put the caches. Would the best approach be to run the >>> queries from a cold solr startup? >>> >>> What about OS disk caches? I can see two arguments. One, just to test >>> solr the disk caches should be empty. On the other hand, realistically, >>> the disk caches would be full so that argues for executing enough >>> queries to load those and then redo the query set (with empty solr caches). >>> >>> Speaking of empty solr caches, is there a way to flush those while solr >>> is running? >>> >>> What other system states do I need to control for to get a handle on >>> response time? >>> >>> Thanks and regards, >>> >>> Phil >>> ------------------------------------------ >>> Phillip Farber - http://www.umdl.umich.edu >>