On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
I'm using DIH to index the data and the structure of the DIH is like below for
solr core:
<entity>
16 child entities
</entity>
During indexing, since the number of requests being made to database was
high(to process one document 17 queries) and was utilizing most of connections
of database thereby blocking our web application.
If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.
To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to
reduce the number of requests to database.
When you use SortedMapBackedCache on an entity, you are asking Solr to
store the results of the entire query in memory, even if you don't need
all of the results. If the database has a lot of rows, that's going to
take a lot of memory.
In your excerpt from the config, your inner entity doesn't have a WHERE
clause. Which means that it's going to retrieve all of the rows of the
ABC table for *EVERY* single entry in the DEF table. That's going to be
exceptionally slow. Normally the SQL query on inner entities will have
some kind of WHERE clause that limits the results to rows that match the
entry from the outer entity.
You may need to write a custom indexing program that runs separately
from Solr, possibly on an entirely different server. That might be a
lot more efficient than DIH.
Thanks,
Shawn