On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : schema.) My evidence for this is the documentCache stats reported by > : solr/admin. If I request "rows=10&fl=id" followed by > : "rows=10&fl=id,title" I would expect to see the 2nd request result in > : a 2nd insert to the cache, but instead I see that the 2nd request hits > : the cache from the 1st request. "rows=10&fl=*" does the same thing. > > your evidence is correct, but your interpretation is incorrect. > > the objects in the documentCache are lucene Documents, which contain a > List of Field refrences. when enableLazyFieldLoading=true is set, and > there is a documentCache Document fetched from the IndexReader only > contains the Fields specified in the fl, and all other Fields are marked > as "LOAD_LAZY". > > When there is a cache hit on that uniqueKey at a later date, the Fields > allready loaded are used directly if requested, but the Fields marked > LOAD_LAZY are (you guessed it) lazy loaded from the IndexReader and then > the Document updates the refrence to the newly actualized fields (which > are no longer marked LOAD_LAZY) > > So with different "fl" params, the same Document Object is continually > used, but the Fields in that Document grow as the fields requested (using > the "fl" param) change.
Great stuff. Makes sense. Thanks for the clarification, and if no one objects I'll update the wiki with some of this info. I'm still not clear on this statement from the wiki's description of the documentCache: "(Note: This cache cannot be used as a source for autowarming because document IDs will change when anything in the index changes so they can't be used by a new searcher.)" Can anyone elaborate a bit on that. I think I've read it at least 10 times and I'm still unable to draw a mental picture. I'm wondering if the document IDs referred to are the ones I'm defining in my schema, or are they the underlying lucene ids, i.e. the ones that, according to the Lucene in Action book, are "relative within each segment"? > : will *not* result in an insert to queryResultCache. I have tried > : various increments--10, 100, 200, 500--and it seems the magic number > : is somewhere between 200 (cache insert) and 500 (no insert). Can > : someone explain this? > > In addition to the <queryResultMaxDocsCached> config option already > mentioned (which controls wether a DocList is cached based on it's size) > there is also the <queryResultWindowSize> config option which may confuse > your cache observations. if the window size is "50" and you ask for > start=0&rows=10 what actually gets cached is "0-50" (assuming there are > more then 50 results) so a subsequent request for start=10&rows=10 will be > a cache hit. Just so I'm clear, does the queryResultCache operate in a similar manner as the documentCache as to what is actually cached? In other words, is it the caching of the docList object that is reported in the cache statistics hits/inserts numbers? And that object would get updated with a new set of ordered doc ids on subsequent, larger requests. (I'm flailing a bit to articulate the question, I know). For example, if my queryResultMaxDocsCached is set to 200 and I issue a request with rows=500, then I won't get a docList object entry in the queryResultCache. However, if I issue a request with rows=10, I will get an insert, and then a later request for rows=500 would re-use and update that original cached docList object. Right? And would it be updated with the full list of 500 ordered doc ids or only 200? Thanks, --jay