Hoss, brilliant as always - many thanks! =)

Subclassing the SolrCache class sounds like a good way to accomplish this.

Some questions:
1) Any recommendations on which best to sub-class? I'm guessing, for this
scenario with "rare" batch puts and no evictions, I'd be looking for get
performance. This will also be on a box with many CPUs - so I wonder if the
older LRUCache would be preferable?

2) Would I need to worry about "auto warming" at all? I'm still a little
foggy on lifecycle of firstSearcher versus newSearcher (is firstSearcher
really only ever called the first time the solr instanced is started?). In
any case, since the only time a commit would occur is when batch updates,
re-indexing and re-optimizing occurs (once a day off-peak perhaps) I
*think* I would always want to perform the same "static warming" rather
than attempting to auto-warm from the old cache - does this make sense?

Thanks again!
     Aaron

On Thu, May 24, 2012 at 7:38 PM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> Interesting problem,
>
> w/o making any changes to Solr, you could probably get this behavior be:
>  a) sizing your cache large neough.
>  b) using a firstSearcher that generates your N queries on startup
>  c) configure autowarming of 100%
>  d) ensure every query you send uses cache=false
>
>
> The tricky part being "d".
>
> But if you don't mind writing a little java, i think this should actually
> be fairly trivial to do w/o needing "d" at all...
>
> 1) subclass the existing SolrCache class of your choice.
> 2) in your subclass, make "put" be a No-Op if getState()==LIVE, else
> super.put(...)
>
> ...so during any warming phase (either static from
> firstSearcher/newSearcher, or because of autowarming) the cache will
> accept new objects, but once warming is done it will ignore requests to
> add new items (so it will never evict anything)
>
> Then all you need is a firstSearcher event listener that feeds you your N
> queries (model it after "QuerySenderListener" but have it read from
> whatever source you want instead of the solrconfig.xml)
>
> : The reason for this somewhat different approach to caching is that we may
> : get any number of odd queries throughout the day for which performance
> : isn't important, and we don't want any of these being added to the cache
> or
> : evicting other entries from the cache. We need to ensure high performance
> : for this pre-determined list of queries only (while still handling other
> : arbitrary queries, if not as quickly)
>
> FWIW: my defacto way of dealing with this in the past was to siloize my
> slave machines by usecase.  For example, in one index: i had 1 master,
> which replicated to 2*N slaves, as well as a repeater.  The 2*N slaves
> were behind 2 diff load balancers (N even numbered machines and N odd
> numbered machines), and the two sets of slaves had diff static cache
> warming configs - even numbered machines warmed queries common to
> "browsing" categories, odd numbered machines warmed top-searches.  If the
> front end was doing an arbitrary search, it was routed to the load blancer
> for the odd-numbered slaves.  if the front end was doing a category
> browse, the query was routed to the even-numbered slaves.  Meanwhile: the
> "repeater" was replicating out to a bunch of smaller one-off boxes with
> cache configs by use case, ie: the data-wharehouse and analytics team had
> their own slave they could run their own complex queries against.  the
> tools team had a dedicated slave that various internal tools would query
> via ajax to get metadata, etc...
>
> -Hoss
>

Reply via email to