Well, the best way to get no cache hits is to set the cache sizes to zero ;). That provides worst-case scenarios and tells you exactly how much you're relying on caches. I'm not talking the lower-level Lucene caches here.
One thing I've done is use the TermsComponent to generate a list of terms actually in my corpus, and save them away "somewhere" to substitute into my queries. The problem with that is when you have anything except very simple queries involving AND, you generate unrealistic queries when you substitute in random values; you can be asking for totally unrelated terms and especially on short fields that leads to lots of 0-hit queries which are also unrealistic. So you get into a long cycle of generating a bunch of queries and removing all queries with less than N hits when you run them. Then generating more. Then... And each time you pick N, it introduces another layer of not-real-world possibly. Sometimes it's the best you can do, but if you can cull real-world applications it's _much_ better. Once you have a bunch (I like 10,000) you can be pretty confident. I not only like to run them randomly, but I also like to sub-divide them into N buckets and then run each bucket in order on the theory that that mimics what users actually did, they don't usually just do stuff at random. Any differences between the random and non-random runs can give interesting information. Best, Erick On Fri, Apr 28, 2017 at 9:38 AM, Rick Leir <rl...@leirtech.com> wrote: > (aside: Using Gatling or Jmeter?) > > Question: How can you easily randomize something in the query so you get no > cache hits? I think there are several levels of caching. > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com