On 5/14/2020 9:36 AM, matthew sporleder wrote:
It appears that adding entities to my entities in my data import
config is slowing down my import process by a lot. Is there a good
way to speed this up? I see the ID's are individually queried instead
of using IN() or similar normal techniques to make things faster.
Just looking for some tips. I prefer this architecture to the way we
currently do it with complex SQL, inserting weird strings, and then
splitting on them (gross but faster).
When you have nested entities, this is how DIH works. A separate SQL
query for the inner entity is made for each row returned on the outer
entity. Nested entities tend to be extremely slow for this reason.
The best way to work around this is to make the database server do the
heavy lifting -- using JOIN or other methods so that you only need one
entity and one SQL query. Doing this will mean that you'll need to
split the data after import, using either the DIH config or the analysis
configuration in the schema.
Thanks,
Shawn