Thanks, Erick. Turned off the query cache and sharded more aggressively
helped bring down the latencies


On Thu, Feb 20, 2014 at 5:07 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> What you _do_ want to do is add replicas so you distribute the CPU
> load across a bunch of machines.
>
> The QueryResultCache isn't very useful unless you have multiple queries
> that
> 1> reference the _exact_ same query, q, fq, sorting and all
> 2> don't page very far.
>
> This cache really only holds the document (internal Lucene) IDs for a
> "window"
> of hits. So say your window (configured in solrconfig.xml) is set to 50.
> For each
> of the query keys, 50 IDs are stored. Next time that exact query comes in,
> and
> _assuming_ start+rows < 50, you'll get the IDs from the cache and not much
> action occurs. The design intent here is to satisfy a few pages of results.
>
> If you mean by "tail queries" that there is very little repetition of
> queries, then
> why bother with a cache at all? If the hit ratio is going towards 0 it's
> not doing
> you enough good to matter.
>
>
> FWIW,
> Erick
>
>
> On Thu, Feb 20, 2014 at 1:58 PM, KNitin <nitin.t...@gmail.com> wrote:
>
> > Hello
> >
> >   I have a 4 node cluster running Solr cloud 4.3.1. I have a few large
> > collections sharded 8 ways across all the 4 nodes (with 2 shards per
> node).
> > The size of the shard for the large collections is around 600-700Mb
> > containing around 250K+ documents.
> >
> > Currently the size of the query cache is around 512. We have a few jobs
> > that run tail queries on these collections. The hit ratio of the cache
> > drops to 0 when running these queries and also at the same time CPU
> spikes.
> > The latencies are in the order of seconds in the above case. I verified
> GC
> > behavior is normal (not killing cpu)
> >
> > The following are my questions
> >
> >
> >    1. Is it a good practice to vary the Query Result Cache size based on
> >    the size of the collection (large collections have large cache)?
> >    2. If most of your queries are tail queries, what is a good way to
> make
> >    your cache usage effective (higher hits)
> >    3. If lets say all your queries miss the cache, it is an OK behavior
> if
> >    your CPU spikes (to 90+%)
> >    4. Is there a recommended shard size (# of doc, size ) to use. A few
> of
> >    my collections are 100-200 Mb and the large ones are in teh order of
> > 800-1Gb
> >
> > Thanks a lot in advance
> > Nitin
> >
>

Reply via email to