Re: Solr and FieldCache

Walter Ferrara Thu, 20 Sep 2007 11:36:38 -0700

About stored/index difference: ID is a string, (= solr.StrField) so
FieldCache give me what I need.


I'm just wondering, as this cached object could be (theoretically)
pretty big, do I need to be aware of some OOM? I know that FieldCache
use weakmaps, so I presume the cached array for the older reader(s) will
be gc-ed when the reader is no longer referenced (i.e. when solr load
the new one, after its warmup and so on), is that right?

Thanks
--

J.J. Larrea wrote:
> At 5:30 PM +0200 9/20/07, Walter Ferrara wrote:
>   
>> I have an index with several fields, but just one stored: ID (string,
>> unique).
>> I need to access that ID field for each of the tops "nodes" docs in my
>> results (this is done inside a handler I wrote), code looks like:
>>
>>     Hits hits = searcher.search(query);
>>     for(int i=0; i<nodes; i++) {
>>            id[i]=hits.doc(i).get("ID");
>>            score[i]=hits.score(i);
>>     }
>>
>> I noticed that retrieving the code is slow.
>>
>> if I use the FieldCache, like:
>> id[i]=FieldCache.DEFAULT.getStrings(searcher.getReader(),
>> "ID")[hits.id(i)];
>>     
>
> I assume you're putting FieldCache.DEFAULT.getStrings(searcher.getReader(),
> "ID") in an array outside the loop, saving 2 redundant method calls per 
> iteration.
>
>   
>> after the first execution (the initialization of the cache take some
>> times), it seems to run much faster.
>>     
>
> Do note that FieldCache.DEFAULT is caching the indexed values, not the stored 
> values.  Since your field is an ID you are probably indexing it in such a way 
> that both are identical, e.g. with KeywordTokenizer, so you're not seeing a 
> difference.
>
>   
>> But what happens when SOLR reload  the index (after a commit, or an
>> optimize for example)?
>> Will it refresh the cache with new reader (in the warmup process?), or
>> it will be the first query execution of that code (with the new reader)
>> that will force the refresh? (this could mean that every first query
>> after a reload will be slower)
>>     
>
> It is refreshed by Lucene the first time the FieldCache array is requested 
> from the new IndexReader.
>
>   
>> Is there any way to tell SOLR to cache and warmup when needed this "ID"
>> field?
>>     
>
> Absolutely, just put a warmup query in solrconfig.xml which makes request 
> that invokes FieldCache.DEFAULT.getStrings on that field.
>
> Simplest would probably be to invoke your custom handler, perhaps passing 
> arguments that limit it to only processing one document to limit the data 
> which gets cached; since getStrings returns the entire array, one pass 
> through your loop is fine.
>
> If that's not easy with your handler, you could achieve the same effect by 
> setting up a handler which facets on the ID field, sorting by ID 
> (facet.sort=false), and only asks for a single value (facet.limit=1) (the 
> entire id[docid] array will get scanned to count references to that ID, but 
> that ensures it gets paged in).
>
> - J.J.
>
>

Re: Solr and FieldCache

Reply via email to