Hi all,

We have successfully integrated Solr in our application, and now we are
facing a requirement where the application should be able to search for
duplicate records in Solr core based on equality in 3 distinct fields.

Tried using SignatureUpdateProcessorFactory as described in
https://cwiki.apache.org/confluence/display/solr/De-Duplication and
Lookup3Signature and everything seems to work fine, signature field is
being filled with unique hash values.

One issue we have, is that we need to pass to
SignatureUpdateProcessorFactory the stemmed value of 1 of 3 fields.
Currenty, the following documents produce different hash values, and we
need them to produce unique.
Analysis for field1 and values "value1_a" and "value1_b" produce stemmed
value "value1"

documentA: {
    field1: value1_a,
    field2: value2,
    field3: value3,
    signature: hash_value1
}

documentB: {
    field1: value1_b,
    field2: value2,
    field3: value3,
    signature: hash_value2
}

I would like to ask whether it is possible to have required behavior, and
some tips about how to accomplish this task.

Thank you in advance,

Leonidas

Reply via email to