Hi Shawn, Thanks for your valuable inputs.
For your information we are using SQL Server. Also, we will try to use the JOIN instead of Cache Entity and check it. Regards P.Yuvaraj Kumar -------------------------------------------- On Wed, 9/7/14, Shawn Heisey <s...@elyograg.org> wrote: Subject: Re: Getting OutOfMemoryError: Java heap space in Solr To: solr-user@lucene.apache.org Date: Wednesday, 9 July, 2014, 9:24 PM On 7/9/2014 6:02 AM, yuvaraj ponnuswamy wrote: > Hi, > > I am getting the OutofMemory Error: "java.lang.OutOfMemoryError: Java heap space" often in production due to the particular Treemap is taking more memory in the JVM. > > When i looked into the config files I am having the entity called UserQryDocument where i am fetching the data from certain tables. > Again i have a sub entiry called "UserLocation" where i am using the CachedSqlEntityProcessor to get the fields from Cache. It seems like it has the total of 2,00,000 records total. > processor="CachedSqlEntityProcessor" cacheKey="user_pin" cacheLookup="UserQueryDocumentNonAuthor.DocKey"> > > Like this i have some other different entity and there also i am using this CachedSqlEntityProcessor in the sub entity. > > But when i looked into the Heap Dump : java_pid57.hprof i am able to see the TreeMap is causing the problem. > > But not able to find which entity is causing this issue.I am using the IBM Heap Ananlyser to look into the Dump. > > Can you please let me know is there any other way we can find out which entity is causing this issue or any other tool to analyse and debug the Out of Memory Issue to find the exact entity is causing this issue. > > I have attched the entity in dataconfig.xml and heap Anayser screen shot. JDBC drivers have a habit of loading the entire resultset into RAM. Also, you are using the cached processor ... which will effectively do the same thing. With millions of DB rows, this is going to require a LOT of heap memory. You'll want to change your JDBC connection so that it doesn't load the entire result set, and you may also need to turn off entity caching in Solr. You didn't mention what database you're using. Here's how to fix MySQL and SQL Server so they don't load the entire result set. The requirements for another database are likely to be different: https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F The best way to make DIH perform well is to use JOIN so that you can get all your data with one entity and one SELECT query. Let the database do all the heavy lifting instead of having Solr send millions of queries. GROUP_CONCAT on the SQL side and a regexTransformer 'splitBy' can sometimes be used to get multiple values into a field. Thanks, Shawn