What you need to do, is to calculate some HASH (using any message digest
algorithm you want, md5, sha-1 and so on), then do some reading on solr
field collapse capabilities. Should not be too complicated..

*Omri Cohen*



Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295




My profiles: [image: LinkedIn] <http://www.linkedin.com/in/omric> [image:
Twitter] <http://www.twitter.com/omricohe> [image:
WordPress]<http://omricohen.me>
 Please consider your environmental responsibility. Before printing this
e-mail message, ask yourself whether you really need a hard copy.
IMPORTANT: The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only. If you have received this
email by mistake, please notify the sender immediately and do not disclose
the contents to anyone or make copies thereof.
Signature powered by
<http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer>
WiseStamp<http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer>



---------- Forwarded message ----------
From: Pranav Prakash <pra...@gmail.com>
Date: Thu, Jun 23, 2011 at 12:26 PM
Subject: Removing duplicate documents from search results
To: solr-user@lucene.apache.org


How can I remove very similar documents from search results?

My scenario is that there are documents in the index which are almost
similar (people submitting same stuff multiple times, sometimes different
people submitting same stuff). Now when a search is performed for "keyword",
in the top N results, quite frequently, same document comes up multiple
times. I want to remove those duplicate (or possible duplicate) documents.
Very similar to what Google does when they say "In order to show you most
relevant result, duplicates have been removed". How can I achieve this
functionality using Solr? Does Solr has an implied or plugin which could
help me with it?


*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com>
|
Google <http://www.google.com/profiles/pranny>

Reply via email to