Hi,

  I'm looking to customizing index time de-duplication. Here's my use case
and what I'm trying to achieve.

I've identical documents coming from different release year of a given
product. I need to index them in Solr as they are required in individual
year context. But there's a generic search which spans across all the years
and hence bring back duplicate/identical content. My goal is to only return
the latest document and filter out the rest. For e.g. if product A has
identical documents for 2015, 2014 and 2013, search should only return 2015
(latest document) and filter out the rest.

What I'm thinking (if possible) during index time :

Index all documents, but add a special tag (e.g. dedup=true) to 2013 and
2014 content, keeping 2015 (the latest release) untouched. During query
time, I'll add a filter which will exclude contents tagged with "dedup".

Just wondering if this is achievable by perhaps extending
UpdateRequestProcessorFactory or
customizing SignatureUpdateProcessorFactory ?

Any pointers will be appreciated.

Regards,
Shamik

Reply via email to