Re: nested entities and DIH indexing time

matthew sporleder Thu, 14 May 2020 14:14:49 -0700

On Thu, May 14, 2020 at 4:46 PM Shawn Heisey <apa...@elyograg.org> wrote:
>
> On 5/14/2020 9:36 AM, matthew sporleder wrote:
> > It appears that adding entities to my entities in my data import
> > config is slowing down my import process by a lot.  Is there a good
> > way to speed this up?  I see the ID's are individually queried instead
> > of using IN() or similar normal techniques to make things faster.
> >
> > Just looking for some tips.  I prefer this architecture to the way we
> > currently do it with complex SQL, inserting weird strings, and then
> > splitting on them (gross but faster).
>
> When you have nested entities, this is how DIH works.  A separate SQL
> query for the inner entity is made for each row returned on the outer
> entity.  Nested entities tend to be extremely slow for this reason.
>
> The best way to work around this is to make the database server do the
> heavy lifting -- using JOIN or other methods so that you only need one
> entity and one SQL query.  Doing this will mean that you'll need to
> split the data after import, using either the DIH config or the analysis
> configuration in the schema.
>
> Thanks,
> Shawn


This is too bad because it is very clean and the JOIN/CONCAT/SPLIT
method is very gross.

I was also hoping to use different delta queries for each nested entity.

Can a non-nested entity write into existing docs, or do they always
have to produce document-per-entity?

Re: nested entities and DIH indexing time

Reply via email to