Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Omri Cohen wrote: >>>>>>>>> >>>>>>>>>> What you need to do, is to calculate some HASH (using any message >>>>>> digest >>>>>>>>>> algorithm you want, md5, sha-1 and so on), then do some re

Re: Removing duplicate documents from search results

2011-06-28 Thread Paul Libbrecht
>>> [image: >>>>>> Twitter] <http://www.twitter.com/omricohe> [image: >>>>>> WordPress]<http://omricohen.me> >>>>>> Please consider your environmental responsibility. Before printing >> this >>>>>> e-mail message, ask yourself whether you r

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
u need to do, is to calculate some HASH (using any message > >>>> digest > >>>>>>>> algorithm you want, md5, sha-1 and so on), then do some reading on > >>>> solr > >>>>>>>> field collapse capabilities. Should n

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
. >>>>>>>> >>>>>>>> *Omri Cohen* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | >>>>>> +972-3-6036295 >>>>>>>>

Re: Removing duplicate documents from search results

2011-06-28 Thread Pranav Prakash
t;>>> +972-3-6036295 > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> My profiles: [image: LinkedIn] <http://www.linkedin.com/in/omric> > >>>> [image: > >>>>>> Twitter] <http://www.twitte

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
>>> [image: >>>>>> Twitter] <http://www.twitter.com/omricohe> [image: >>>>>> WordPress]<http://omricohen.me> >>>>>> Please consider your environmental responsibility. Before printing >> this >>>>>> e-mail messa

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
>>> Please consider your environmental responsibility. Before printing > this > >>>> e-mail message, ask yourself whether you really need a hard copy. > >>>> IMPORTANT: The contents of this email and any attachments are > >> confidential. > >

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
eceived >>>> this >>>> email by mistake, please notify the sender immediately and do not >> disclose >>>> the contents to anyone or make copies thereof. >>>> Signature powered by >>>> < >>>> >> http://www.wise

Re: Removing duplicate documents from search results

2011-06-28 Thread Mohammad Shariq
do not > disclose > >> the contents to anyone or make copies thereof. > >> Signature powered by > >> < > >> > http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer > >> > > >> WiseStamp&l

Re: Removing duplicate documents from search results

2011-06-23 Thread simon
campaign=footer >> > >> WiseStamp< >> http://www.wisestamp.com/email-install?utm_source=extension&utm_medium=email&utm_campaign=footer >> > >> >> >> >> -- Forwarded message -- >> From: Pranav Prakash >> Date: Thu,

Re: Removing duplicate documents from search results

2011-06-23 Thread pravesh
group of duplicates)?? The latest one? -- View this message in context: http://lucene.472066.n3.nabble.com/Removing-duplicate-documents-from-search-results-tp3099214p3099432.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
dium=email&utm_campaign=footer > > > > > > -- Forwarded message -- > From: Pranav Prakash > Date: Thu, Jun 23, 2011 at 12:26 PM > Subject: Removing duplicate documents from search results > To: solr-user@lucene.apache.org > > > How can I rem

Re: Removing duplicate documents from search results

2011-06-23 Thread Omri Cohen
anav Prakash Date: Thu, Jun 23, 2011 at 12:26 PM Subject: Removing duplicate documents from search results To: solr-user@lucene.apache.org How can I remove very similar documents from search results? My scenario is that there are documents in the index which are almost similar (people s

Removing duplicate documents from search results

2011-06-23 Thread Pranav Prakash
How can I remove very similar documents from search results? My scenario is that there are documents in the index which are almost similar (people submitting same stuff multiple times, sometimes different people submitting same stuff). Now when a search is performed for "keyword", in the top N res