BTW, we only have one node and the collection has just one shard.

On Wed, Mar 29, 2017 at 10:52 AM, Wenjie Zhang <wenjiezhang2...@gmail.com>
wrote:

> Hi there,
>
> We are in solr 6.0.1, here is our solr schema and config:
>
> <uniqueKey>_unique_key</uniqueKey>
>
> <updateRequestProcessorChain name="dedupe">
>    <processor class="solr.TruncateFieldUpdateProcessorFactory">
>     <str name="typeClass">solr.StrField</str>
>     <int name="maxLength">32766</int>
>   </processor>
>     <processor class="solr.LogUpdateProcessorFactory"/>
>     <processor class="solr.DistributedUpdateProcessorFactory"/>
>     <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
>     <processor class="solr.FieldNameMutatingUpdateProcessorFactory">
>       <str name="pattern">[^\w-\.]</str>
>       <str name="replacement">_</str>
>     </processor>
>     <processor class="solr.RunUpdateProcessorFactory"/>
>   </updateRequestProcessorChain>
>
> When having above configuration, and doing following operations, we will
> see duplicate documents (two documents have same _unique_key)
>
> 1, Add document:
>
> *final SolrInputDocument document = new SolrInputDocument();*
>
> * document.setField("_unique_key", "key1");*
>
> * final UpdateRequest request = new UpdateRequest();*
>
> *request.add(document);*
>
> *solrClient.request(request,collectionName);*
> 2, Overwrite the document with
>
> *final SolrInputDocument document = new SolrInputDocument();*
>
> *document.setField("_unique_key", **"key1"**);*
>
> *final SolrInputDocument childDocument = **new SolrInputDocument();*
>
> *childDocument**.setField("name", "name");*
>
> *childDocument**.setField("parent_id", "**key1**");*
>
> * document.addChildDocument(childDocument);*
>
> *final UpdateRequest request = new UpdateRequest();*
>
> *request.add(document);*
>
> *solrClient.request(request,collectionName);*
>
> After this, we will see three documents in our collection, one for the
> child document we added, two for the parent document and both have
> "_unique_key" as "key1".
>
>
> After doing some researching, we found the "SignatureUpdateProcessorFacto
> ry", so we modified our solrConfig.xml to add "
> SignatureUpdateProcessorFactory".
>
>  <updateRequestProcessorChain name="dedupe">
>     <processor class="org.apache.solr.update.processor.
> SignatureUpdateProcessorFactory">
>       <str name="signatureField">signatureField</str>
>       <bool name="overwriteDupes">true</bool>
>       <str name="fields">_entityKey</str>
>       <str name="signatureClass">org.apache.solr.update.processor.
> Lookup3Signature</str>
>     </processor>
>     <processor class="solr.TruncateFieldUpdateProcessorFactory">
>     <str name="typeClass">solr.StrField</str>
>     <int name="maxLength">32766</int>
>    </processor>
>     <processor class="solr.LogUpdateProcessorFactory"/>
>     <processor class="solr.DistributedUpdateProcessorFactory"/>
>     <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
>     <processor class="solr.FieldNameMutatingUpdateProcessorFactory">
>       <str name="pattern">[^\w-\.]</str>
>       <str name="replacement">_</str>
>     </processor>
>     <processor class="solr.RunUpdateProcessorFactory"/>
>
>   </updateRequestProcessorChain>
>
> After the change, we run the code in a new collection, the duplicate
> document issue is gone, but the child document is also not shown in the
> search result when searching (*:*),.
> However, the block join ({!parent which="_unique_key:*"}name:*) works
> fine, but not the join ({!join from=parent_id to=_unique_key}), it
> returns nothing.
>
> Any idea?
>
>
> Thanks,
> Jack
>

Reply via email to