I've never seen it necessary to run "thousands of queries" to warm Solr. Usually less than a dozen will work fine. My challenge would be for you to measure performance differences on queries after running, say, 12 well-chosen queries as opposed to hundreds/thousands. I bet that if 1> you search across all the relevant fields, you'll fill up the low-level caches for those fields. 2> you facet on all the fields you intend to facet on. 3> you sort on all the fields you intend to sort on. 4> you specify some filter queries. This is fuzzy since really depends on you being able to predict what those will be for firstSearcher. Things like "in the last day/week/month" can be pre-configured, but others you won't get. BTW, here's a blog about why "in the last day" fq clauses can be tricky. http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
that you'll pretty much nail warmup and be fine. Note that you can do all the faceting on a single query. Specifying the primary, secondary & etc. sorts will fill those caches. Best, Erick On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes <jwar...@whitepages.com> wrote: > > On 7/21/14, 4:50 PM, "Shawn Heisey" <s...@elyograg.org> wrote: > > >On 7/21/2014 5:37 PM, Jeff Wartes wrote: > >> I¹d like to ensure an extended warmup is done on each SolrCloud node > >>prior to that node serving traffic. > >> I can do certain things prior to starting Solr, such as pump the index > >>dir through /dev/null to pre-warm the filesystem cache, and post-start I > >>can use the ping handler with a health check file to prevent the node > >>from entering the clients load balancer until I¹m ready. > >> What I seem to be missing is control over when a node starts > >>participating in queries sent to the other nodes. > >> > >> I can, of course, add solrconfig.xml firstSearcher queries, which I > >>assume (and fervently hope!) happens before a node registers itself in > >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well > >>if I want that initial warmup to run thousands of queries, or run them > >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive > >>to the size. > >> > >> Any ideas, or corrections to my assumptions? > > > >I think that firstSearcher/newSearcher (and making sure useColdSearcher > >is set to false) is going to be the only way you can do this in a way > >that's compatible with SolrCloud. If you were doing manual distributed > >search without SolrCloud, you'd have more options available. > > > >If useColdSearcher is set to false, that should keep *everything* from > >using the searcher until the warmup has finished. I cannot be certain > >that this is the case, but I have some reasonable confidence that this > >is how it works. If you find that it doesn't behave this way, I'd call > >it a bug. > > > >Thanks, > >Shawn > > > Thanks for the quick reply. Since distributed search latency is the max of > the shard sub-requests, I¹m trying my best to minimize any spikes in > cluster latency due to node restarts. > I double-checked useColdSearcher was false, but the doc says this means > requests ³block until the first searcher is done warming², which > translates pretty clearly to ³latency spike². The more I think about it, > the more worried I am that a node might indeed register itself in > live_nodes and get distributed requests before it¹s got a searcher to work > with. *Especially* if I have lots of serial firstSearcher queries. > > I¹ll look through the code myself tomorrow, but if anyone can help > confirm/deny the order of operations here, I¹d appreciate it. > >