Thanks, Erick. Turned off the query cache and sharded more aggressively helped bring down the latencies
On Thu, Feb 20, 2014 at 5:07 PM, Erick Erickson <erickerick...@gmail.com>wrote: > What you _do_ want to do is add replicas so you distribute the CPU > load across a bunch of machines. > > The QueryResultCache isn't very useful unless you have multiple queries > that > 1> reference the _exact_ same query, q, fq, sorting and all > 2> don't page very far. > > This cache really only holds the document (internal Lucene) IDs for a > "window" > of hits. So say your window (configured in solrconfig.xml) is set to 50. > For each > of the query keys, 50 IDs are stored. Next time that exact query comes in, > and > _assuming_ start+rows < 50, you'll get the IDs from the cache and not much > action occurs. The design intent here is to satisfy a few pages of results. > > If you mean by "tail queries" that there is very little repetition of > queries, then > why bother with a cache at all? If the hit ratio is going towards 0 it's > not doing > you enough good to matter. > > > FWIW, > Erick > > > On Thu, Feb 20, 2014 at 1:58 PM, KNitin <nitin.t...@gmail.com> wrote: > > > Hello > > > > I have a 4 node cluster running Solr cloud 4.3.1. I have a few large > > collections sharded 8 ways across all the 4 nodes (with 2 shards per > node). > > The size of the shard for the large collections is around 600-700Mb > > containing around 250K+ documents. > > > > Currently the size of the query cache is around 512. We have a few jobs > > that run tail queries on these collections. The hit ratio of the cache > > drops to 0 when running these queries and also at the same time CPU > spikes. > > The latencies are in the order of seconds in the above case. I verified > GC > > behavior is normal (not killing cpu) > > > > The following are my questions > > > > > > 1. Is it a good practice to vary the Query Result Cache size based on > > the size of the collection (large collections have large cache)? > > 2. If most of your queries are tail queries, what is a good way to > make > > your cache usage effective (higher hits) > > 3. If lets say all your queries miss the cache, it is an OK behavior > if > > your CPU spikes (to 90+%) > > 4. Is there a recommended shard size (# of doc, size ) to use. A few > of > > my collections are 100-200 Mb and the large ones are in teh order of > > 800-1Gb > > > > Thanks a lot in advance > > Nitin > > >