Following up from a post I made back in 2011... > I am a user of Solr 3.2 and I make use of the distributed search capabilities > of Solr using > a fairly simple architecture of a coordinator + some shards. > > Correct me if I am wrong: In a standard distributed search with > QueryComponent, the first > query sent to the shards asks for fl=myUniqueKey or fl=myUniqueKey,score. > When the response > is being generated to send back to the coordinator, SolrIndexSearcher.doc > (int i, Set<String> > fields) is called for each document. As I understand it, this will read each > document from > the index _on disk_ and retrieve the myUniqueKey field value for each > document. > > My idea is to have a FieldCache for the myUniqueKey field in > SolrIndexSearcher (or somewhere > else?) that would be used in cases where the only field that needs to be > retrieved is myUniqueKey. > Is this something that would improve performance? > > In our actual setup, we are using an extended version of QueryComponent that > queries for a > couple other fields besides myUniqueKey in the initial query to the shards, > and it asks a > lot of rows when doing so, many more than what the user ends up getting back > when they see > the results. (The reasons for this are complicated and aren't related much > to this question.) > We already maintain FieldCaches for the fields that we are asking for, but > for other purposes. > Would it make sense to utilize these FieldCaches in SolrIndexSearcher? Is > this something > that anyone else has done before?
We did end up doing this inside of the SolrIndexSearcher.doc() method. Basically I check if the fields Set only contains fields that I am willing to use the FieldCache for, and if so, build up the Document from the data inside of the FieldCache. Basically looks like this... if (fieldNamesToRetrieveFromFieldCache.containsAll(fields)) { d = new Document(); if (fields.contains("myUniqueKeyField")) { long value = FieldCache.DEFAULT.getLongs(reader, "myUniqueKeyField")[i]; if (value != 0) { d.add(new NumericField("myUniqueKeyField", Field.Store.YES, true).setLongValue(value)); } } if (fields.contains("someOtherField")) { long value = FieldCache.DEFAULT.getLongs(reader, "someOtherField")[i]; if (value != 0) { d.add(new NumericField("someOtherField", Field.Store.YES, true).setLongValue(value)); } } } I don't have a more generalized patch that makes it easily configurable, but the idea is fairly simple. We have had good results from this. For a system of n shards, this reduces the average number of docs to retrieve from disk per shard from rows to rows/n. For requests with a large rows parameter (e.g., 1000) and many shards, this makes a noticeable difference in response time. Obviously this isn't the typical Solr use case, so your mileage may vary. -Michael