Hi,

I just thought of sharing a suggestion for overcoming OOM issues with
CachedSQLEntityProcessor.

Consider a scenario as below,

If we have sub entities in DIH,

<entity x query="select * from x"> ---> object 
                                <entity y query="select * from y"
processor="cachedSqlEntityprocessor" cachekey=y.id cachevalue=x.id> -->
object properties 

cachedSqlEntityprocessor works as below,
 
•       First entity x will get executed and the entire table gets stored in 
cache 
•       next entity y gets executed and entire table gets stored in cache 
•       Finally the comparison happens through hash map . 

Instead of this if it can process the child entities in batches (like for
1000 parent id's) in each batch so that it doesnt have to cahce the entire
child table in memory but it just needs to fetch the child entities
corresponding to each batch.

Something like this...

<entity x query="select * from x”> ---> object --> cache the complete data
in parent
                <entity y query="select * from y where uid in (pass 10000
id's from parent entity and fetch just those from database)"
processor="cachedSqlEntityprocessor" cachekey=y.id cachevalue=x.id> -->
object properties

As of now I got to know that DIH process the data on a row by row basis, if
we make the DIH process the data in batches it would help to resolve the OOM
issues.

One thing is tat there will be more number of SQL queries issues by DIH when
we use this method but it would be a kind of hybrid approach to resolve both
memory / performance issues.

Please let me know your thoughts.

Thanks,
Barani






-- 
View this message in context: 
http://n3.nabble.com/Suggestion-for-cachedSQLentityprocessor-tp704158p704158.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to