Hi there,
We are in solr 6.0.1, here is our solr schema and config:
<uniqueKey>_unique_key</uniqueKey>
<updateRequestProcessorChain name="dedupe">
<processor class="solr.TruncateFieldUpdateProcessorFactory">
<str name="typeClass">solr.StrField</str>
<int name="maxLength">32766</int>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
When having above configuration, and doing following operations, we will
see duplicate documents (two documents have same _unique_key)
1, Add document:
*final SolrInputDocument document = new SolrInputDocument();*
* document.setField("_unique_key", "key1");*
* final UpdateRequest request = new UpdateRequest();*
*request.add(document);*
*solrClient.request(request,collectionName);*
2, Overwrite the document with
*final SolrInputDocument document = new SolrInputDocument();*
*document.setField("_unique_key", **"key1"**);*
*final SolrInputDocument childDocument = **new SolrInputDocument();*
*childDocument**.setField("name", "name");*
*childDocument**.setField("parent_id", "**key1**");*
* document.addChildDocument(childDocument);*
*final UpdateRequest request = new UpdateRequest();*
*request.add(document);*
*solrClient.request(request,collectionName);*
After this, we will see three documents in our collection, one for the
child document we added, two for the parent document and both have
"_unique_key" as "key1".
After doing some researching, we found the "SignatureUpdateProcessorFactory",
so we modified our solrConfig.xml to add "SignatureUpdateProcessorFactory".
<updateRequestProcessorChain name="dedupe">
<processor
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<str name="signatureField">signatureField</str>
<bool name="overwriteDupes">true</bool>
<str name="fields">_entityKey</str>
<str
name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.TruncateFieldUpdateProcessorFactory">
<str name="typeClass">solr.StrField</str>
<int name="maxLength">32766</int>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
After the change, we run the code in a new collection, the duplicate
document issue is gone, but the child document is also not shown in the
search result when searching (*:*),.
However, the block join ({!parent which="_unique_key:*"}name:*) works fine,
but not the join ({!join from=parent_id to=_unique_key}), it returns
nothing.
Any idea?
Thanks,
Jack