Hi, I just thought of sharing a suggestion for overcoming OOM issues with CachedSQLEntityProcessor.
Consider a scenario as below, If we have sub entities in DIH, <entity x query="select * from x"> ---> object <entity y query="select * from y" processor="cachedSqlEntityprocessor" cachekey=y.id cachevalue=x.id> --> object properties cachedSqlEntityprocessor works as below, • First entity x will get executed and the entire table gets stored in cache • next entity y gets executed and entire table gets stored in cache • Finally the comparison happens through hash map . Instead of this if it can process the child entities in batches (like for 1000 parent id's) in each batch so that it doesnt have to cahce the entire child table in memory but it just needs to fetch the child entities corresponding to each batch. Something like this... <entity x query="select * from x”> ---> object --> cache the complete data in parent <entity y query="select * from y where uid in (pass 10000 id's from parent entity and fetch just those from database)" processor="cachedSqlEntityprocessor" cachekey=y.id cachevalue=x.id> --> object properties As of now I got to know that DIH process the data on a row by row basis, if we make the DIH process the data in batches it would help to resolve the OOM issues. One thing is tat there will be more number of SQL queries issues by DIH when we use this method but it would be a kind of hybrid approach to resolve both memory / performance issues. Please let me know your thoughts. Thanks, Barani -- View this message in context: http://n3.nabble.com/Suggestion-for-cachedSQLentityprocessor-tp704158p704158.html Sent from the Solr - User mailing list archive at Nabble.com.