You can do that pretty easily by just retrieving extra documents and post processing the results list.
You are likely to have a significant number of apparent duplicates this way. To really get rid of duplicates in results, it might be better to remove them from the corpus by deploying something like LSH clustering. On Thu, Nov 24, 2011 at 5:04 PM, Fred Zimmerman <zimzaz....@gmail.com>wrote: > I have a corpus that has a lot of identical or nearly identical documents. > I'd like to return only the unique ones (excluding the "nearly identical" > which are redirects). I notice that all the identical/nearly identicals > have identical Solr scores. How can I tell Solr to throw out all the > successive documents in an answer set that have identical scores? > > doc 1 score 5.0 > doc 2 score 5.0 > doc 3 score 5.0 > doc 4 score 4.9 > > skip docs 2 and 3 > > bring back 10 docs with unique scores >