Re: best way to cache "base" queries (before application of filters)

Yonik Seeley Wed, 20 May 2009 09:43:35 -0700

Some thoughts:

#1) This is sort of already implemented in some form... see this
section of solrconfig.xml and try uncommenting it:


   <!-- An optimization that attempts to use a filter to satisfy a search.
         If the requested sort does not include score, then the filterCache
         will be checked for a filter matching the query. If found, the filter
         will be used as the source of document ids, and then the sort will be
         applied to that.
    <useFilterForSortedQuery>true</useFilterForSortedQuery>
   -->

Unfortunately, it's currently a system-wide setting... you can't
select it per-query.

#2) Your problem might be able to be solved with field collapsing on
the "category" field in the future (but it's not in Solr yet).

#3) Current work I'm doing right now will push Filters down a level
and check them in tandem with the query instead of after.  This should
speed things up by at least a factor of 2 in your case.
https://issues.apache.org/jira/browse/SOLR-1165

I'm trying to get SOLR-1165 finished this week, and I'd love to see
how it affects your performance.
In the meantime, try useFilterForSortedQuery and let us know if it
still works (it's been turned off for a loooong time) ;-)

-Yonik
http://www.lucidimagination.com



On Wed, May 20, 2009 at 3:47 AM, Kent Fitch <kent.fi...@gmail.com> wrote:
> Hi,  I'm looking for some advice on how to add "base query" caching to SOLR.
>
> Our use-case for SOLR is:
>
> - a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8
> in 6 months)
> - a frontend which presents views of this data in 5 "categories" by firing
> off 5 queries with the same search term but 5 different "fq" values
>
> For example, an originating query for "sydney harbour" generates 5 SOLR
> queries:
>
> - ../search?q=<complicated expansion of sydney harbour>&fq=category:books
> - ../search?q=<complicated expansion of sydney harbour>&fq=category:maps
> - ../search?q=<complicated expansion of sydney harbour>&fq=category:music
> etc
>
> The complicated expansion requiring sloppy phrase matches, and the large
> database with lots of very large documents means that some queries take
> quite some time (10's to several 100's of ms), so we'd like to cache the
> results of the base query for a short time (long enough for all related
> queries to be issued).
>
> It looks like this isnt the use-case for queryResultCache, because its key
> is calculated in SolrIndexSearcher like this:
>
> key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(),
> cmd.getFlags());
>
> That is, the filters are part of the key; and the result that's cached
> results reflects the application of the filters, and this works great for
> what it is probably designed for - supporting paging through results.
>
> So, I think our options are:
>
> - create a new queryComponent that invokes SolrIndexSearcher differently,
> and which has its own (short lived but long entry length) cache of the base
> query results
>
> - subclass or change SolrIndexSearcher, perhaps making it "pluggable",
> perhaps defining an optional new cache of base query results
>
> - create a sublcass of the Lucene IndexSearcher which manages a cache of
> query results "hidden" from SolrIndexSearcher (and organise somehow for
> SolrIndexSearcher to use that sublass)
>
> Or perhaps Im taking the wrong approach to this problem entirely!  Any
> advice is greatly appreciated.
>
> Kent Fitch
>

Re: best way to cache "base" queries (before application of filters)

Reply via email to