On Thu, May 14, 2020 at 4:46 PM Shawn Heisey <apa...@elyograg.org> wrote: > > On 5/14/2020 9:36 AM, matthew sporleder wrote: > > It appears that adding entities to my entities in my data import > > config is slowing down my import process by a lot. Is there a good > > way to speed this up? I see the ID's are individually queried instead > > of using IN() or similar normal techniques to make things faster. > > > > Just looking for some tips. I prefer this architecture to the way we > > currently do it with complex SQL, inserting weird strings, and then > > splitting on them (gross but faster). > > When you have nested entities, this is how DIH works. A separate SQL > query for the inner entity is made for each row returned on the outer > entity. Nested entities tend to be extremely slow for this reason. > > The best way to work around this is to make the database server do the > heavy lifting -- using JOIN or other methods so that you only need one > entity and one SQL query. Doing this will mean that you'll need to > split the data after import, using either the DIH config or the analysis > configuration in the schema. > > Thanks, > Shawn
This is too bad because it is very clean and the JOIN/CONCAT/SPLIT method is very gross. I was also hoping to use different delta queries for each nested entity. Can a non-nested entity write into existing docs, or do they always have to produce document-per-entity?