On 04/05/2010 02:28 PM, bbarani wrote:
Hi,

I am using cachedSqlEntityprocessor in DIH to index the data. Please find
below my dataconfig structure,

<entity x query="select * from x">  --->  object
<entity y query="select * from y" processor="cachedSqlEntityprocessor"
cachekey=y.id cachevalue=x.id>  -->  object properties

For each and every object I would be retrieveing corresponding object
properties (in my subqueries).

I get in to OOM very often and I think thats a trade off if I use
cachedSqlEntityprocessor.

My assumption is that when I use cachedSqlEntityprocessor the indexing
happens as follows,

First entity x will get executed and the entire table gets stored in cache

next entity y gets executed and entire table gets stored in cache

Finally the compasion heppens through hash map .

So always I need to have the memory allocated to SOLR JVM more than or equal
to the data present in tables?


Now my final question is that even after SOLR complexes indexing the memory
used previously is not getting released. I could still see the JVM consuming
1.5 GB after the indexing completes. I tried to use Java hotspot options but
didnt see any differences..

Any thoughts / confirmation on my assumptions above would be of great help
to me to get in to  a decision of choosing cachedSqlEntityprocessor or not.

Thanks,
BB




You are right - CacheSQLEntityProcessor: the cache is an unbounded HashMap, with no option to bound it.

IMO this should be fixed - want to make a JIRA issue? I've brought it up on the list before, but I don't think I ever got around to making an issue.

As to why its not getting released - that is odd. Perhaps a GC has just not been triggered yet and it will be released? If not, that's a pretty nasty bug. Can you try forcing a GC to see? (say with jconsole?)

--
- Mark

http://www.lucidimagination.com



Reply via email to