SPLITSHARD - data loss of child documents
Hi Everyone, We're using version 8.6.1 with nested documents. I used the SPLITSHARD API and after it finished successfully, I've noticed the following: 1. Most of child documents are missing - before the split: ~600M, after: 68M 2. Retrieving a document with its children, shows child documents that do not belong to this parent (their parentID value is different than parent's ID). I didn't see any limitation in the API documentation. Do you have any suggestions? Thanks in advance, Ronen. This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Re: SPLITSHARD - data loss of child documents
I was under the impression that split shard doesn’t work with child documents, if that is missing from the ref guide we should update it On Thu, Dec 17, 2020 at 4:30 AM Nussbaum, Ronen wrote: > Hi Everyone, > > We're using version 8.6.1 with nested documents. > I used the SPLITSHARD API and after it finished successfully, I've noticed > the following: > > 1. Most of child documents are missing - before the split: ~600M, > after: 68M > 2. Retrieving a document with its children, shows child documents that > do not belong to this parent (their parentID value is different than > parent's ID). > > I didn't see any limitation in the API documentation. > Do you have any suggestions? > > Thanks in advance, > Ronen. > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. >
Re: DIH and UUIDProcessorFactory
On 12/12/2020 4:36 PM, Shawn Heisey wrote: On 12/12/2020 2:30 PM, Dmitri Maziuk wrote: Right, ```Every update request received by Solr is run through a chain of plugins known as Update Request Processors, or URPs.``` The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Update Request", my reading is it doesn't and URP chain applies only to ' If you define an update chain as default, then it will be used for all updates made where a different chain is not specifically requested. I have used this personally to have my custom update chain apply even when the indexing comes from DIH. I know for sure that this works on 4.x and 5.x versions; it should work on newer versions as well. Confirmed w/ 8.7.0: I finally got to importing the one DB where I need this, and UUIDs are there with the default URP chain. Thank you Dima
Re: DIH and UUIDProcessorFactory
Try with the explicit URP chain too. It may work as well. Regards, Alex. On Thu, 17 Dec 2020 at 16:51, Dmitri Maziuk wrote: > > On 12/12/2020 4:36 PM, Shawn Heisey wrote: > > On 12/12/2020 2:30 PM, Dmitri Maziuk wrote: > >> Right, ```Every update request received by Solr is run through a chain > >> of plugins known as Update Request Processors, or URPs.``` > >> > >> The part I'm missing is whether DIH's ' >> name="/dataimport"' counts as an "Update Request", my reading is it > >> doesn't and URP chain applies only to ' > > > If you define an update chain as default, then it will be used for all > > updates made where a different chain is not specifically requested. > > > > I have used this personally to have my custom update chain apply even > > when the indexing comes from DIH. I know for sure that this works on > > 4.x and 5.x versions; it should work on newer versions as well. > > > > Confirmed w/ 8.7.0: I finally got to importing the one DB where I need > this, and UUIDs are there with the default URP chain. > > Thank you > Dima > >
Re: DIH and UUIDProcessorFactory
On 12/17/2020 4:05 PM, Alexandre Rafalovitch wrote: Try with the explicit URP chain too. It may work as well. Actually in this case we're just making sure uniqueKey is in fact unique in all documents, so default is what we want. For this particular dataset I may at some future point look into generating ID as a hash of some unique tuple or other, but then I expect we'll still want to keep the UUID fallback. Dima