Re: documentCache clarification

Jay Luker Thu, 28 Oct 2010 07:55:42 -0700

On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : schema.) My evidence for this is the documentCache stats reported by
> : solr/admin. If I request "rows=10&fl=id" followed by
> : "rows=10&fl=id,title" I would expect to see the 2nd request result in
> : a 2nd insert to the cache, but instead I see that the 2nd request hits
> : the cache from the 1st request. "rows=10&fl=*" does the same thing.
>
> your evidence is correct, but your interpretation is incorrect.
>
> the objects in the documentCache are lucene Documents, which contain a
> List of Field refrences.  when enableLazyFieldLoading=true is set, and
> there is a documentCache Document fetched from the IndexReader only
> contains the Fields specified in the fl, and all other Fields are marked
> as "LOAD_LAZY".
>
> When there is a cache hit on that uniqueKey at a later date, the Fields
> allready loaded are used directly if requested, but the Fields marked
> LOAD_LAZY are (you guessed it) lazy loaded from the IndexReader and then
> the Document updates the refrence to the newly actualized fields (which
> are no longer marked LOAD_LAZY)
>
> So with different "fl" params, the same Document Object is continually
> used, but the Fields in that Document grow as the fields requested (using
> the "fl" param) change.


Great stuff. Makes sense. Thanks for the clarification, and if no one
objects I'll update the wiki with some of this info.

I'm still not clear on this statement from the wiki's description of
the documentCache: "(Note: This cache cannot be used as a source for
autowarming because document IDs will change when anything in the
index changes so they can't be used by a new searcher.)"

Can anyone elaborate a bit on that. I think I've read it at least 10
times and I'm still unable to draw a mental picture. I'm wondering if
the document IDs referred to are the ones I'm defining in my schema,
or are they the underlying lucene ids, i.e. the ones that, according
to the Lucene in Action book, are "relative within each segment"?


> : will *not* result in an insert to queryResultCache. I have tried
> : various increments--10, 100, 200, 500--and it seems the magic number
> : is somewhere between 200 (cache insert) and 500 (no insert). Can
> : someone explain this?
>
> In addition to the <queryResultMaxDocsCached> config option already
> mentioned (which controls wether a DocList is cached based on it's size)
> there is also the <queryResultWindowSize> config option which may confuse
> your cache observations.  if the window size is "50" and you ask for
> start=0&rows=10 what actually gets cached is "0-50" (assuming there are
> more then 50 results) so a subsequent request for start=10&rows=10 will be
> a cache hit.

Just so I'm clear, does the queryResultCache operate in a similar
manner as the documentCache as to what is actually cached? In other
words, is it the caching of the docList object that is reported in the
cache statistics hits/inserts numbers? And that object would get
updated with a new set of ordered doc ids on subsequent, larger
requests. (I'm flailing a bit to articulate the question, I know). For
example, if my queryResultMaxDocsCached is set to 200 and I issue a
request with rows=500, then I won't get a docList object entry in the
queryResultCache. However, if I issue a request with rows=10, I will
get an insert, and then a later request for rows=500 would re-use and
update that original cached docList object. Right? And would it be
updated with the full list of 500 ordered doc ids or only 200?

Thanks,
--jay

Re: documentCache clarification

Reply via email to