SPLITSHARD - data loss of child documents

2020-12-17 Thread Nussbaum, Ronen
Hi Everyone,

We're using version 8.6.1 with nested documents.
I used the SPLITSHARD API and after it finished successfully, I've noticed the 
following:

  1.  Most of child documents are missing - before the split: ~600M, after: 68M
  2.  Retrieving a document with its children, shows child documents that do 
not belong to this parent (their parentID value is different than parent's ID).

I didn't see any limitation in the API documentation.
Do you have any suggestions?

Thanks in advance,
Ronen.


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: SPLITSHARD - data loss of child documents

2020-12-17 Thread Mike Drob
I was under the impression that split shard doesn’t work with child
documents, if that is missing from the ref guide we should update it

On Thu, Dec 17, 2020 at 4:30 AM Nussbaum, Ronen 
wrote:

> Hi Everyone,
>
> We're using version 8.6.1 with nested documents.
> I used the SPLITSHARD API and after it finished successfully, I've noticed
> the following:
>
>   1.  Most of child documents are missing - before the split: ~600M,
> after: 68M
>   2.  Retrieving a document with its children, shows child documents that
> do not belong to this parent (their parentID value is different than
> parent's ID).
>
> I didn't see any limitation in the API documentation.
> Do you have any suggestions?
>
> Thanks in advance,
> Ronen.
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk

On 12/12/2020 4:36 PM, Shawn Heisey wrote:

On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
Right, ```Every update request received by Solr is run through a chain 
of plugins known as Update Request Processors, or URPs.```


The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Update Request", my reading is it 
doesn't and URP chain applies only to '

If you define an update chain as default, then it will be used for all 
updates made where a different chain is not specifically requested.


I have used this personally to have my custom update chain apply even 
when the indexing comes from DIH.  I know for sure that this works on 
4.x and 5.x versions; it should work on newer versions as well.




Confirmed w/ 8.7.0: I finally got to importing the one DB where I need 
this, and UUIDs are there with the default URP chain.


Thank you
Dima




Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Alexandre Rafalovitch
Try with the explicit URP chain too. It may work as well.

Regards,
   Alex.

On Thu, 17 Dec 2020 at 16:51, Dmitri Maziuk  wrote:
>
> On 12/12/2020 4:36 PM, Shawn Heisey wrote:
> > On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
> >> Right, ```Every update request received by Solr is run through a chain
> >> of plugins known as Update Request Processors, or URPs.```
> >>
> >> The part I'm missing is whether DIH's ' >> name="/dataimport"' counts as an "Update Request", my reading is it
> >> doesn't and URP chain applies only to ' >
> > If you define an update chain as default, then it will be used for all
> > updates made where a different chain is not specifically requested.
> >
> > I have used this personally to have my custom update chain apply even
> > when the indexing comes from DIH.  I know for sure that this works on
> > 4.x and 5.x versions; it should work on newer versions as well.
> >
>
> Confirmed w/ 8.7.0: I finally got to importing the one DB where I need
> this, and UUIDs are there with the default URP chain.
>
> Thank you
> Dima
>
>


Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk

On 12/17/2020 4:05 PM, Alexandre Rafalovitch wrote:

Try with the explicit URP chain too. It may work as well.


Actually in this case we're just making sure uniqueKey is in fact unique 
in all documents, so default is what we want.


For this particular dataset I may at some future point look into 
generating ID as a hash of some unique tuple or other, but then I expect 
we'll still want to keep the UUID fallback.


Dima