Hi Alex, Thanks for your time and please see response inline.
Thanks, Jack On Wed, Mar 29, 2017 at 11:36 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > There are too many things here. As far as I understand: > *) You should not need to use Signature chain [Jack] I have the same > feeling as well, but I just do not know why this should change the behavior > of child documents. > *) You should have a uniqueID assigned to the child record [Jack] > Ideally, child document should not require a unique key, since it is just > child document, and it is linked internally with parent document by solr, > but I will try to add a unique key in the child document as well to see how > it behaves. > *) You should not assign parentID to the child record, it will be [Jack] > You are right, but we add this parentId in this child document because of > the orphen child issue (https://issues.apache.org/jira/browse/SOLR-5211), > so we can overcome the orphen child issue by doing join. I am juse confused > by the behavior change after added the Signature chain. assigned automatically > *) Double check that your unique_key field type is string (not text or > similar), though this does not seem to be the issue [Jack] yes, it is a > string. > > Make sure to run the test against clean/empty index, just to see if > maybe something is hanging around. I would also test that against the > latest Solr, just in case something was fixed in a meanwhile. [Jack] > Make sense, I will also test it in the latest version. > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 29 March 2017 at 14:30, Wenjie Zhang <wenjiezhang2...@gmail.com> wrote: > > BTW, we only have one node and the collection has just one shard. > > > > On Wed, Mar 29, 2017 at 10:52 AM, Wenjie Zhang < > wenjiezhang2...@gmail.com> > > wrote: > > > >> Hi there, > >> > >> We are in solr 6.0.1, here is our solr schema and config: > >> > >> <uniqueKey>_unique_key</uniqueKey> > >> > >> <updateRequestProcessorChain name="dedupe"> > >> <processor class="solr.TruncateFieldUpdateProcessorFactory"> > >> <str name="typeClass">solr.StrField</str> > >> <int name="maxLength">32766</int> > >> </processor> > >> <processor class="solr.LogUpdateProcessorFactory"/> > >> <processor class="solr.DistributedUpdateProcessorFactory"/> > >> <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/> > >> <processor class="solr.FieldNameMutatingUpdateProcessorFactory"> > >> <str name="pattern">[^\w-\.]</str> > >> <str name="replacement">_</str> > >> </processor> > >> <processor class="solr.RunUpdateProcessorFactory"/> > >> </updateRequestProcessorChain> > >> > >> When having above configuration, and doing following operations, we will > >> see duplicate documents (two documents have same _unique_key) > >> > >> 1, Add document: > >> > >> *final SolrInputDocument document = new SolrInputDocument();* > >> > >> * document.setField("_unique_key", "key1");* > >> > >> * final UpdateRequest request = new UpdateRequest();* > >> > >> *request.add(document);* > >> > >> *solrClient.request(request,collectionName);* > >> 2, Overwrite the document with > >> > >> *final SolrInputDocument document = new SolrInputDocument();* > >> > >> *document.setField("_unique_key", **"key1"**);* > >> > >> *final SolrInputDocument childDocument = **new SolrInputDocument();* > >> > >> *childDocument**.setField("name", "name");* > >> > >> *childDocument**.setField("parent_id", "**key1**");* > >> > >> * document.addChildDocument(childDocument);* > >> > >> *final UpdateRequest request = new UpdateRequest();* > >> > >> *request.add(document);* > >> > >> *solrClient.request(request,collectionName);* > >> > >> After this, we will see three documents in our collection, one for the > >> child document we added, two for the parent document and both have > >> "_unique_key" as "key1". > >> > >> > >> After doing some researching, we found the > "SignatureUpdateProcessorFacto > >> ry", so we modified our solrConfig.xml to add " > >> SignatureUpdateProcessorFactory". > >> > >> <updateRequestProcessorChain name="dedupe"> > >> <processor class="org.apache.solr.update.processor. > >> SignatureUpdateProcessorFactory"> > >> <str name="signatureField">signatureField</str> > >> <bool name="overwriteDupes">true</bool> > >> <str name="fields">_entityKey</str> > >> <str name="signatureClass">org.apache.solr.update.processor. > >> Lookup3Signature</str> > >> </processor> > >> <processor class="solr.TruncateFieldUpdateProcessorFactory"> > >> <str name="typeClass">solr.StrField</str> > >> <int name="maxLength">32766</int> > >> </processor> > >> <processor class="solr.LogUpdateProcessorFactory"/> > >> <processor class="solr.DistributedUpdateProcessorFactory"/> > >> <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/> > >> <processor class="solr.FieldNameMutatingUpdateProcessorFactory"> > >> <str name="pattern">[^\w-\.]</str> > >> <str name="replacement">_</str> > >> </processor> > >> <processor class="solr.RunUpdateProcessorFactory"/> > >> > >> </updateRequestProcessorChain> > >> > >> After the change, we run the code in a new collection, the duplicate > >> document issue is gone, but the child document is also not shown in the > >> search result when searching (*:*),. > >> However, the block join ({!parent which="_unique_key:*"}name:*) works > >> fine, but not the join ({!join from=parent_id to=_unique_key}), it > >> returns nothing. > >> > >> Any idea? > >> > >> > >> Thanks, > >> Jack > >> >