Hello, I'm experimenting with ways to add some degree of diversity to search results by re-ranking them. For example, I might want to take the top 100 docs (sorted by score), and rearrange them so that no more than 2 results share a particular attribute x within any 20-result block. It's a best effort algorithm since there may be more than 10 results that have x. And if the original list already satisfies the diversity goal, then the ordering is unchanged. So 2 questions:
1. What's a good way to implement this? The most obvious solution (at least for this particular example) might be field collapsing. But we do need faceting as well. And the two don't yet work together according to http://wiki.apache.org/solr/FieldCollapsing . It also wouldn't be applicable if the re-ranking function depended on things other than field values (like the score). Custom sorting (FieldComparatorSource) doesn't seem to work either because the relative ordering of 2 docs depends not only on their field values but on what other docs match the query as well. So right now I'm doing post-processing: sort by score, look up x for each top doc, then re-arrange if necessary. Is there a better way? 2. We need a fast way to fetch x for a large (100s) number of docs. It'd be great if the doc()/document() methods could automatically use the field cache - perhaps with something like https://issues.apache.org/jira/browse/SOLR-1961 . That hasn't been accepted, though. So I wrote this on top of the Solr API: private static void loadCachedFields(SolrDocument doc, SolrIndexSearcher searcher, int docId, final Set<String> cachedFields) throws IOException { // find leaf reader and doc id offset for this doc SolrIndexReader reader = searcher.getReader(); int[] offsets = reader.getLeafOffsets(); int idx = SolrIndexReader.readerIndex(docId, offsets); SolrIndexReader leafReader = reader.getLeafReaders()[idx]; int offset = offsets[idx]; IndexSchema schema = searcher.getSchema(); for (String f : cachedFields) { Object val; if (schema.getField(f).getType() instanceof IntField) { val = FieldCache.DEFAULT.getInts(leafReader, f)[docId - offset]; } else ... doc.addField(f, val); } } (I borrowed the doc id offset code from QueryComponent.) Does this look like a reasonable solution? Thanks! - zhi-da