Using FieldCache in SolrIndexSearcher for distributed id retrieval

Michael Ryan Tue, 29 Jan 2013 19:55:18 -0800

Following up from a post I made back in 2011...

> I am a user of Solr 3.2 and I make use of the distributed search capabilities 
> of Solr using
> a fairly simple architecture of a coordinator + some shards.
> 
> Correct me if I am wrong:  In a standard distributed search with 
> QueryComponent, the first
> query sent to the shards asks for fl=myUniqueKey or fl=myUniqueKey,score.  
> When the response
> is being generated to send back to the coordinator, SolrIndexSearcher.doc 
> (int i, Set<String>
> fields) is called for each document.  As I understand it, this will read each 
> document from
> the index _on disk_ and retrieve the myUniqueKey field value for each 
> document.
> 
> My idea is to have a FieldCache for the myUniqueKey field in 
> SolrIndexSearcher (or somewhere
> else?) that would be used in cases where the only field that needs to be 
> retrieved is myUniqueKey.
>  Is this something that would improve performance?
> 
> In our actual setup, we are using an extended version of QueryComponent that 
> queries for a
> couple other fields besides myUniqueKey in the initial query to the shards, 
> and it asks a
> lot of rows when doing so, many more than what the user ends up getting back 
> when they see
> the results.  (The reasons for this are complicated and aren't related much 
> to this question.)
>  We already maintain FieldCaches for the fields that we are asking for, but 
> for other purposes.
>  Would it make sense to utilize these FieldCaches in SolrIndexSearcher?  Is 
> this something
> that anyone else has done before?


We did end up doing this inside of the SolrIndexSearcher.doc() method. 
Basically I check if the fields Set only contains fields that I am willing to 
use the FieldCache for, and if so, build up the Document from the data inside 
of the FieldCache. Basically looks like this...

if (fieldNamesToRetrieveFromFieldCache.containsAll(fields)) {
  d = new Document();
  if (fields.contains("myUniqueKeyField")) {
    long value = FieldCache.DEFAULT.getLongs(reader, "myUniqueKeyField")[i];
    if (value != 0) {
      d.add(new NumericField("myUniqueKeyField", Field.Store.YES, 
true).setLongValue(value));
    }
  }
  if (fields.contains("someOtherField")) {
    long value = FieldCache.DEFAULT.getLongs(reader, "someOtherField")[i];
    if (value != 0) {
      d.add(new NumericField("someOtherField", Field.Store.YES, 
true).setLongValue(value));
    }
  }
}

I don't have a more generalized patch that makes it easily configurable, but 
the idea is fairly simple.

We have had good results from this. For a system of n shards, this reduces the 
average number of docs to retrieve from disk per shard from rows to rows/n. For 
requests with a large rows parameter (e.g., 1000) and many shards, this makes a 
noticeable difference in response time. Obviously this isn't the typical Solr 
use case, so your mileage may vary. 

-Michael

Using FieldCache in SolrIndexSearcher for distributed id retrieval

Reply via email to