Hoss, brilliant as always - many thanks! =) Subclassing the SolrCache class sounds like a good way to accomplish this.
Some questions: 1) Any recommendations on which best to sub-class? I'm guessing, for this scenario with "rare" batch puts and no evictions, I'd be looking for get performance. This will also be on a box with many CPUs - so I wonder if the older LRUCache would be preferable? 2) Would I need to worry about "auto warming" at all? I'm still a little foggy on lifecycle of firstSearcher versus newSearcher (is firstSearcher really only ever called the first time the solr instanced is started?). In any case, since the only time a commit would occur is when batch updates, re-indexing and re-optimizing occurs (once a day off-peak perhaps) I *think* I would always want to perform the same "static warming" rather than attempting to auto-warm from the old cache - does this make sense? Thanks again! Aaron On Thu, May 24, 2012 at 7:38 PM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > Interesting problem, > > w/o making any changes to Solr, you could probably get this behavior be: > a) sizing your cache large neough. > b) using a firstSearcher that generates your N queries on startup > c) configure autowarming of 100% > d) ensure every query you send uses cache=false > > > The tricky part being "d". > > But if you don't mind writing a little java, i think this should actually > be fairly trivial to do w/o needing "d" at all... > > 1) subclass the existing SolrCache class of your choice. > 2) in your subclass, make "put" be a No-Op if getState()==LIVE, else > super.put(...) > > ...so during any warming phase (either static from > firstSearcher/newSearcher, or because of autowarming) the cache will > accept new objects, but once warming is done it will ignore requests to > add new items (so it will never evict anything) > > Then all you need is a firstSearcher event listener that feeds you your N > queries (model it after "QuerySenderListener" but have it read from > whatever source you want instead of the solrconfig.xml) > > : The reason for this somewhat different approach to caching is that we may > : get any number of odd queries throughout the day for which performance > : isn't important, and we don't want any of these being added to the cache > or > : evicting other entries from the cache. We need to ensure high performance > : for this pre-determined list of queries only (while still handling other > : arbitrary queries, if not as quickly) > > FWIW: my defacto way of dealing with this in the past was to siloize my > slave machines by usecase. For example, in one index: i had 1 master, > which replicated to 2*N slaves, as well as a repeater. The 2*N slaves > were behind 2 diff load balancers (N even numbered machines and N odd > numbered machines), and the two sets of slaves had diff static cache > warming configs - even numbered machines warmed queries common to > "browsing" categories, odd numbered machines warmed top-searches. If the > front end was doing an arbitrary search, it was routed to the load blancer > for the odd-numbered slaves. if the front end was doing a category > browse, the query was routed to the even-numbered slaves. Meanwhile: the > "repeater" was replicating out to a bunch of smaller one-off boxes with > cache configs by use case, ie: the data-wharehouse and analytics team had > their own slave they could run their own complex queries against. the > tools team had a dedicated slave that various internal tools would query > via ajax to get metadata, etc... > > -Hoss >