You can do that pretty easily by just retrieving extra documents and post
processing the results list.

You are likely to have a significant number of apparent duplicates this
way.

To really get rid of duplicates in results, it might be better to remove
them from the corpus by deploying something like LSH clustering.

On Thu, Nov 24, 2011 at 5:04 PM, Fred Zimmerman <zimzaz....@gmail.com>wrote:

> I have a corpus that has a lot of identical or nearly identical documents.
> I'd like to return only the unique ones (excluding the "nearly identical"
> which are redirects).  I notice that all the identical/nearly identicals
> have identical Solr scores. How can I tell Solr to  throw out all the
> successive documents in an answer set that have identical scores?
>
> doc 1 score 5.0
> doc 2  score 5.0
> doc 3 score 5.0
> doc 4 score 4.9
>
> skip docs 2 and 3
>
> bring back 10 docs with unique scores
>

Reply via email to