On 4/18/2017 11:21 PM, ankur.168 wrote:
> I thought DIH does parallel db request for all the entities defined in a
> document.

I do not know anything about that.  It *could* be possible for all the
sub-entities just below another entity to run in parallel, but I've got
no idea whether this is the case.  At the top level, there is only one
thread handling documents one at a time, this I am sure of.

> I do believe that DIH is easier to use that's why I am trying to find a way 
> to use this in my current system. But as I explained above since I have so 
> many sub entities,each returns list of response which will be joined in to 
> parent. for more than 2 lacs document, full import is taking forever.
>
> What I am looking for is a way to speed up my full import using DIH only. To 
> achieve this I tried to split the document in 2 and do full import parallely. 
> but with this approach latest import overrides other document indexed data, 
> since unique key(property_id) is same for both documents.

The way to achieve top speed with DIH is to *not* define nested
entities.  Only define one entity with a single SELECT statement.  Let
the database handle all the JOIN work.  In my DIH config, I do "SELECT *
FROM X WHERE Y" ... X is a view defined on the database server that
handles all the JOINs, and Y is a fairly detailed conditional.

> One way I could think of is to keep document in different core which will 
> maintain different index files and merge the search results from both cores 
> while performing search on indexed data. But is this a good approach?

In order to do a sharded query, the uniqueKey field would need to be
unique across all cores.  My index is sharded manually, each shard does
a separate import when fully rebuilding the index.  The sharding
algorithm is coded into the SQL statement.

Thanks,
Shawn

Reply via email to