Hey Otis,
Yep, I realized this myself after playing some with the dedupe feature
yesterday.
So it does look like Field collapsing is what I need pretty much.
Any idea on how close it is to being production-ready?

Thanks,
-Chak

Otis Gospodnetic wrote:
> 
> Hi,
> 
> As far as I know, the point of deduplication in Solr (
> http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate
> document before indexing it in order to avoid duplicates in the index in
> the first place.
> 
> What you are describing is closer to field collapsing patch in SOLR-236.
> 
>  Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> ----- Original Message ----
>> From: KaktuChakarabati <jimmoe...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tue, November 24, 2009 5:29:00 PM
>> Subject: Deduplication in 1.4
>> 
>> 
>> Hey,
>> I've been trying to find some documentation on using this feature in 1.4
>> but
>> Wiki page is alittle sparse..
>> In specific, here's what i'm trying to do:
>> 
>> I have a field, say 'duplicate_group_id' that i'll populate based on some
>> offline documents deduplication process I have.
>> 
>> All I want is for solr to compute a 'duplicate_signature' field based on
>> this one at update time, so that when i search for documents later, all
>> documents with same original 'duplicate_group_id' value will be rolled up
>> (e.g i'll just get the first one that came back  according to relevancy).
>> 
>> I enabled the deduplication processor and put it into updater, but i'm
>> not
>> seeing any difference in returned results (i.e results with same
>> duplicate_id are returned separately..)
>> 
>> is there anything i need to supply in query-time for this to take effect?
>> what should be the behaviour? is there any working example of this?
>> 
>> Anything will be helpful..
>> 
>> Thanks,
>> Chak
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to