o,
> have you tried using http://wiki.apache.org/solr/Deduplication ?
> >>
> >> Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >
>>
>> Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> - Original Message
>>> From: Joe Calderon
>>> To: solr-user@l
;
>
>
> - Original Message
>> From: Joe Calderon
>> To: solr-user@lucene.apache.org
>> Sent: Friday, July 31, 2009 5:06:48 PM
>> Subject: dealing with duplicates
>>
>> hello all, i have a collection of a few million documents; i have many
>>
ucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Joe Calderon
> To: solr-user@lucene.apache.org
> Sent: Friday, July 31, 2009 5:06:48 PM
> Subject: dealing with duplicates
>
> hello all, i have a collection of a few million
hello all, i have a collection of a few million documents; i have many
duplicates in this collection. they have been clustered with a simple
algorithm, i have a field called 'duplicate' which is 0 or 1 and a
fields called 'description, tags, meta', documents are clustered on
different criteria and