What you need to do, is to calculate some HASH (using any message digest algorithm you want, md5, sha-1 and so on), then do some reading on solr field collapse capabilities. Should not be too complicated..
*Omri Cohen* Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3-6036295 My profiles: [image: LinkedIn] <http://www.linkedin.com/in/omric> [image: Twitter] <http://www.twitter.com/omricohe> [image: WordPress]<http://omricohen.me> Please consider your environmental responsibility. Before printing this e-mail message, ask yourself whether you really need a hard copy. IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and do not disclose the contents to anyone or make copies thereof. Signature powered by <http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer> WiseStamp<http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer> ---------- Forwarded message ---------- From: Pranav Prakash <pra...@gmail.com> Date: Thu, Jun 23, 2011 at 12:26 PM Subject: Removing duplicate documents from search results To: solr-user@lucene.apache.org How can I remove very similar documents from search results? My scenario is that there are documents in the index which are almost similar (people submitting same stuff multiple times, sometimes different people submitting same stuff). Now when a search is performed for "keyword", in the top N results, quite frequently, same document comes up multiple times. I want to remove those duplicate (or possible duplicate) documents. Very similar to what Google does when they say "In order to show you most relevant result, duplicates have been removed". How can I achieve this functionality using Solr? Does Solr has an implied or plugin which could help me with it? *Pranav Prakash* "temet nosce" Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> | Google <http://www.google.com/profiles/pranny>