Re: Question on index time de-duplication

2015-11-01 Thread shamik
That's what I observed as well. Perhaps there's a way to customize SignatureUpdateProcessorFactory to support my use case. I'll look into the source code and figure if there's a way to do it. -- View this message in context: http://lucene.472066.n3.nabble.com/Quest

Re: Question on index time de-duplication

2015-10-31 Thread Zheng Lin Edwin Yeo
this message in context: > http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplication-tp4237306p4237409.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: Question on index time de-duplication

2015-10-30 Thread shamik
sible using SignatureUpdateProcessorFactory. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplication-tp4237306p4237409.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Question on index time de-duplication

2015-10-30 Thread shamik
rocessorFactory ? -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplication-tp4237306p4237403.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on index time de-duplication

2015-10-30 Thread shamik
alent, which is a requirement for me. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-index-time-de-duplication-tp4237306p4237401.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Question on index time de-duplication

2015-10-30 Thread Markus Jelsma
solr-user@lucene.apache.org > Subject: Re: Question on index time de-duplication > > At the top of the De-Duplication wiki page is a note about collapsing > results. Once you have the signature (identical for each of the duplicates) > you'll want to collapse your results, keeping the

Re: Question on index time de-duplication

2015-10-30 Thread Scott Stults
At the top of the De-Duplication wiki page is a note about collapsing results. Once you have the signature (identical for each of the duplicates) you'll want to collapse your results, keeping the one with max date. https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results k/r,

Re: Question on index time de-duplication

2015-10-29 Thread Zheng Lin Edwin Yeo
Yes, you can try to use the SignatureUpdateProcessorFactory to do a hashing of the content to a signature field, and group the signature field during your search. You can find more information here: https://cwiki.apache.org/confluence/display/solr/De-Duplication I have been using this method to g

Question on index time de-duplication

2015-10-29 Thread Shamik Bandopadhyay
Hi, I'm looking to customizing index time de-duplication. Here's my use case and what I'm trying to achieve. I've identical documents coming from different release year of a given product. I need to index them in Solr as they are required in individual year context. But there's a generic search