I've never seen it necessary to run "thousands of queries"
to warm Solr. Usually less than a dozen will work fine. My
challenge would be for you to measure performance differences
on queries after running, say, 12 well-chosen queries as
opposed to hundreds/thousands. I bet that if
1> you search across all the relevant fields, you'll fill up the
     low-level caches for those fields.
2> you facet on all the fields you intend to facet on.
3> you sort on all the fields you intend to sort on.
4> you specify some filter queries. This is fuzzy since
     really depends on you being able to predict what
     those will be for firstSearcher. Things like "in the
     last day/week/month" can be pre-configured, but
     others you won't get. BTW, here's a blog about
     why "in the last day" fq clauses can be tricky.
   http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

that you'll pretty much nail warmup and be fine. Note that
you can do all the faceting on a single query. Specifying
the primary, secondary & etc. sorts will fill those caches.

Best,
Erick


On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> On 7/21/14, 4:50 PM, "Shawn Heisey" <s...@elyograg.org> wrote:
>
> >On 7/21/2014 5:37 PM, Jeff Wartes wrote:
> >> I¹d like to ensure an extended warmup is done on each SolrCloud node
> >>prior to that node serving traffic.
> >> I can do certain things prior to starting Solr, such as pump the index
> >>dir through /dev/null to pre-warm the filesystem cache, and post-start I
> >>can use the ping handler with a health check file to prevent the node
> >>from entering the clients load balancer until I¹m ready.
> >> What I seem to be missing is control over when a node starts
> >>participating in queries sent to the other nodes.
> >>
> >> I can, of course, add solrconfig.xml firstSearcher queries, which I
> >>assume (and fervently hope!) happens before a node registers itself in
> >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well
> >>if I want that initial warmup to run thousands of queries, or run them
> >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
> >>to the size.
> >>
> >> Any ideas, or corrections to my assumptions?
> >
> >I think that firstSearcher/newSearcher (and making sure useColdSearcher
> >is set to false) is going to be the only way you can do this in a way
> >that's compatible with SolrCloud.  If you were doing manual distributed
> >search without SolrCloud, you'd have more options available.
> >
> >If useColdSearcher is set to false, that should keep *everything* from
> >using the searcher until the warmup has finished.  I cannot be certain
> >that this is the case, but I have some reasonable confidence that this
> >is how it works.  If you find that it doesn't behave this way, I'd call
> >it a bug.
> >
> >Thanks,
> >Shawn
>
>
> Thanks for the quick reply. Since distributed search latency is the max of
> the shard sub-requests, I¹m trying my best to minimize any spikes in
> cluster latency due to node restarts.
> I double-checked useColdSearcher was false, but the doc says this means
> requests ³block until the first searcher is done warming², which
> translates pretty clearly to ³latency spike². The more I think about it,
> the more worried I am that a node might indeed register itself in
> live_nodes and get distributed requests before it¹s got a searcher to work
> with. *Especially* if I have lots of serial firstSearcher queries.
>
> I¹ll look through the code myself tomorrow, but if anyone can help
> confirm/deny the order of operations here, I¹d appreciate it.
>
>

Reply via email to