Re: a thought on cache

Chris Hostetter Thu, 03 Aug 2006 23:53:51 -0700

: I don't do Solr, but had this thought that might be interesting: instead
: of associating cache with an IndexSearcher, it could stand by it self.
: When new documents are inserted (if I understand it right, Solr have
: some kind of notification system for this) the cached queries are placed
: on the new documents (indexed in a Memory- or InstantiatedIndex [Lucene
: issue 550]) to see if they affect the cached results. If not, cache is
: kept. If, cache is rebuilt or removed. With pre-tokenized fields (Lucene
: issue 580) it would not consume that much resources at all, but perhaps
: that will not fit in the Solr-scheme.


I may be missunderstanding your idea, so let me reword it the way i
understand it and you tell me if i'm missing something...

  1) as new docs come in, add them to a purely in memory index
  2) when it becomes time to "commit" the new documents, test all queries
     in the cache against this in memory index.
  3) any query in the cache which has a hit on this in memory index should
     be invalidated, any query which does not have a hit is stll valid.

...this could probably work if the index was purely additive (ie: only
ever grew over time) but I don't think it's feasible in an index in which
delets are executed ... not only would you need to check if one of hte
cached queries matched on the deleted document, but the next segment merge
could collapse doc ids above deleted docs which were totally unrelated to
any docs that were added or deleted -- so you would htink they are still
valid even though the doc ids in the cache don't corrispond to the same
documents anymore.

: Any immediate comments on that? I'd like to implement something like
: this for my self as I notice the CPU working a bit harder than I want it
: to every time I update an index.

Solr reduces this impact by letting you configure "cache warming" when
changes are commited, the gist of it is that while the "old" IndexSearcher
is still being used by external requests (and still using it's cache) a
new "on deck" IndexSearcher is opened, and an internal thread is running
queries against it (the results of which are being cached) for all of the
"best"  items in the previous cache.  once a certain number of cache
enteries have been seeded, the "on deck" INdexSearcher is swapped in and
used for all future queries.

you can even configure custom actions to take place on commit or optimize
(using "listeners") if you want different prepopulation of your caches
each time.  I for example wrote a warming plugin that crawls the metadata
in my index, and caches all sorts of Filters for them up to a configurable
amount of time, at which point it gives up -- i have it configured to be
used on server start up, aka "firstSearcher".



-Hoss

Re: a thought on cache

Reply via email to