subject:"remove answers with identical scores"

Re: remove answers with identical scores

2011-11-25 Thread Erick Erickson

Have you considered removing them at index time? See: http://wiki.apache.org/solr/Deduplication Best Erick On Fri, Nov 25, 2011 at 3:13 PM, Ted Dunning wrote: > See http://en.wikipedia.org/wiki/Locality-sensitive_hashing > > The obvious thought that I had just after hitting send was that you cou

Re: remove answers with identical scores

2011-11-25 Thread Ted Dunning

See http://en.wikipedia.org/wiki/Locality-sensitive_hashing The obvious thought that I had just after hitting send was that you could put the LSH signatures on the documents. That would let you do the scan at low volume and using LSH would make the duplicate scan almost as fast as your score scan

Re: remove answers with identical scores

2011-11-25 Thread Fred Zimmerman

thanks. i did consider postprocessing and may wind up doing that, i was hoping there was a way to have Solr do it for me! that I have to as this question is probably not a good sign, but what is LSH clustering? On Fri, Nov 25, 2011 at 4:34 AM, Ted Dunning wrote: > You can do that pretty easily

Re: remove answers with identical scores

2011-11-25 Thread Ted Dunning

You can do that pretty easily by just retrieving extra documents and post processing the results list. You are likely to have a significant number of apparent duplicates this way. To really get rid of duplicates in results, it might be better to remove them from the corpus by deploying something

remove answers with identical scores

2011-11-24 Thread Fred Zimmerman

I have a corpus that has a lot of identical or nearly identical documents. I'd like to return only the unique ones (excluding the "nearly identical" which are redirects). I notice that all the identical/nearly identicals have identical Solr scores. How can I tell Solr to throw out all the success

Re: remove answers with identical scores

Re: remove answers with identical scores

Re: remove answers with identical scores

Re: remove answers with identical scores

remove answers with identical scores

5 matches

Site Navigation

Mail list logo

Footer information