Re: Full Indexing is Causing a Java Heap Out of Memory Exception

Ahmet Arslan Sat, 05 Apr 2014 03:32:27 -0700

Hi,

Now we have a more informative error : 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.OutOfMemoryError: Java heap space


Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)

1) Does this happen when you increase -Xmx64m -Xms64m ?

2) I see you use custom jars called "MDSolrDIHTransformer JARs inside"  But I 
don't see any Transformers used in database.xm, why is that. I would remove 
them just to be sure.

3) I see you have org.apache.solr.core.StandardDirectoryFactory declared in 
sorlconfig. Assuming you are using, 64 bit windows, it is recommended to use 
MMap 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html


4) In your previous mail you had batch size set, now there is not batchSize 
defined in database.xml. For MySQL it is recommended to use -1. Not sure about 
oracle, I personally used 10,000 once for Oracle. 
http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

You have a lot of copyFields defined. There could be some gotchas when handling 
unusually much copy fields. I would really try CSV option here. Given that you 
have only full import SQL defined and it is not a complex one. It queries only 
one table. I believe Oracle has some tool to export a table to CSV file 
efficiently.

On Saturday, April 5, 2014 3:05 AM, Candygram For Mongo 
<candygram.for.mo...@gmail.com> wrote:
 
Does this user list allow attachments?  I have four files attached 
(database.xml, error.txt, schema.xml, solrconfig.xml).  We just ran the process 
again using the parameters you suggested, but not to a csv file.  It errored 
out quickly.  We are working on the csv file run.

Removed both <autoCommit> and
<autoSoftCommit> parts/definitions from solrconfig.xml

Disabled tlog by removing
   <updateLog>
      <str
name="dir">${solr.ulog.dir:}</str>
    </updateLog>

from solrconfig.xml

Used commit=true parameter.
?commit=true&command=full-import




On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan <iori...@yahoo.com> wrote:

Hi,
>
>This may not solve your problem but generally it is recommended to disable 
>auto commit and transaction logs for bulk indexing.
>And issue one commit at the very end. Do you tlogs enabled? I see "commit 
>failed" in the error message thats why I am offering this.
>
>And regarding comma separated values, with this approach you focus on just 
>solr importing process. You separate data acquisition phrase. And it is very 
>fast load even big csv files  http://wiki.apache.org/solr/UpdateCSV
>I have never experienced OOM during indexing, I suspect data acquisition has 
>role in it.
>
>Ahmet
>
>
>On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo 
><candygram.for.mo...@gmail.com> wrote:
>
>We would be happy to try that.  That sounds counter intuitive for the high 
>volume of records we have.  Can you help me understand how that might solve 
>our problem?
>
>
>
>
>On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>
>Hi,
>>
>>Can you remove auto commit for bulk import. Commit at the very end?
>>
>>Ahmet
>>
>>
>>
>>
>>On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo 
>><candygram.for.mo...@gmail.com> wrote:
>>In case the attached database.xml file didn't show up, I have pasted in the
>>contents below:
>>
>><dataConfig>
>><dataSource
>>name="org_only"
>>type="JdbcDataSource"
>>driver="oracle.jdbc.OracleDriver"
>>url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL"
>>user="admin"
>>password="admin"
>>readOnly="false"
>>batchSize="100"
>>/>
>><document>
>>
>>
>><entity name="full-index" query="
>>select
>>
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null')
>>as SOLR_ID,
>>
>>'ORCL.ADDRESS_ACCT_ALL'
>>as SOLR_CATEGORY,
>>
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as
>>ADDRESSALLROWID,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as
>>ADDRESSALLADDRTYPECD,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as
>>ADDRESSALLLONGITUDE,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as
>>ADDRESSALLLATITUDE,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as
>>ADDRESSALLADDRNAME,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as
>>ADDRESSALLCITY,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as
>>ADDRESSALLSTATE,
>>NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as
>>ADDRESSALLEMAILADDR
>>
>>from ORCL.ADDRESS_ACCT_ALL
>>" >
>>
>><field column="SOLR_ID" name="id" />
>><field column="SOLR_CATEGORY" name="category" />
>><field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" />
>><field column="ADDRESSALLADDRTYPECD"
>>name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" />
>><field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc" />
>><field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" />
>><field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" />
>><field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" />
>><field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" />
>><field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc"
>>/>
>>
>></entity>
>>
>>
>>
>><!-- Varaibles -->
>><!-- '${dataimporter.last_index_time}' -->
>></document>
>></dataConfig>
>>
>>
>>
>>
>>
>>
>>On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo <
>>candygram.for.mo...@gmail.com> wrote:
>>
>>> In this case we are indexing an Oracle database.
>>>
>>> We do not include the data-config.xml in our distribution.  We store the
>>> database information in the database.xml file.  I have attached the
>>> database.xml file.
>>>
>>> When we use the default merge policy settings, we get the same results.
>>>
>>>
>>>
>>> We have not tried to dump the table to a comma separated file.  We think
>>> that dumping this size table to disk will introduce other memory problems
>>> with big file management. We have not tested that case.
>>>
>>>
>>> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which database are you using? Can you send us data-config.xml?
>>>>
>>>> What happens when you use default merge policy settings?
>>>>
>>>> What happens when you dump your table to Comma Separated File and fed
>>>> that file to solr?
>>>>
>>>> Ahmet
>>>>
>>>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo <
>>>> candygram.for.mo...@gmail.com> wrote:
>>>>
>>>> The ramBufferSizeMB was set to 6MB only on the test system to make the
>>>> system crash sooner.  In production that tag is commented out which
>>>> I believe forces the default value to be used.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>>>>
>>>> Hi,
>>>> >
>>>> >out of curiosity, why did you set ramBufferSizeMB to 6?
>>>> >
>>>> >Ahmet
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo <
>>>> candygram.for.mo...@gmail.com> wrote:
>>>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception
>>>> >
>>>> >*SOLR/Lucene version: *4.2.1*
>>>> >
>>>> >
>>>> >*JVM version:
>>>> >
>>>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
>>>> >
>>>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>>>> >
>>>> >
>>>> >
>>>> >*Indexer startup command:
>>>> >
>>>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m
>>>> >
>>>> >
>>>> >
>>>> >java " %JVMARGS% ^
>>>> >
>>>> >-Dcom.sun.management.jmxremote.port=1092 ^
>>>> >
>>>> >-Dcom.sun.management.jmxremote.ssl=false ^
>>>> >
>>>> >-Dcom.sun.management.jmxremote.authenticate=false ^
>>>> >
>>>> >-jar start.jar
>>>> >
>>>> >
>>>> >
>>>> >*SOLR indexing HTTP parameters request:
>>>> >
>>>> >webapp=/solr path=/dataimport
>>>> >params={clean=false&command=full-import&wt=javabin&version=2}
>>>> >
>>>> >
>>>> >
>>>> >We are getting a Java heap OOM exception when indexing (updating) 27
>>>> >million records.  If we increase the Java heap memory settings the
>>>> problem
>>>> >goes away but we believe the problem has not been fixed and that we will
>>>> >eventually get the same OOM exception.  We have other processes on the
>>>> >server that also require resources so we cannot continually increase the
>>>> >memory settings to resolve the OOM issue.  We are trying to find a way to
>>>> >configure the SOLR instance to reduce or preferably eliminate the
>>>> >possibility of an OOM exception.
>>>> >
>>>> >
>>>> >
>>>> >We can reproduce the problem on a test machine.  We set the Java heap
>>>> >memory size to 64MB to accelerate the exception.  If we increase this
>>>> >setting the same problems occurs, just hours later.  In the test
>>>> >environment, we are using the following parameters:
>>>> >
>>>> >
>>>> >
>>>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m
>>>> >
>>>> >
>>>> >
>>>> >Normally we use the default solrconfig.xml file with only the following
>>>> jar
>>>> >file references added:
>>>> >
>>>> >
>>>> >
>>>> ><lib path="../../../../default/lib/common.jar" />
>>>> >
>>>> ><lib path="../../../../default/lib/webapp.jar" />
>>>> >
>>>> ><lib path="../../../../default/lib/commons-pool-1.4.jar" />
>>>> >
>>>> >
>>>> >
>>>> >Using these values and trying to index 6 million records from the
>>>> database,
>>>> >the Java Heap Out of Memory exception is thrown very quickly.
>>>> >
>>>> >
>>>> >
>>>> >We were able to complete a successful indexing by further modifying the
>>>> >solrconfig.xml and removing all or all but one <copyfield> tags from the
>>>> >schema.xml file.
>>>> >
>>>> >
>>>> >
>>>> >The following solrconfig.xml values were modified:
>>>> >
>>>> >
>>>> >
>>>> ><ramBufferSizeMB>6</ramBufferSizeMB>
>>>> >
>>>> >
>>>> >
>>>> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>>>> >
>>>> ><int name="maxMergeAtOnce">2</int>
>>>> >
>>>> ><int name="maxMergeAtOnceExplicit">2</int>
>>>> >
>>>> ><int name="segmentsPerTier">10</int>
>>>> >
>>>> ><int name="maxMergedSegmentMB">150</int>
>>>> >
>>>> ></mergePolicy>
>>>> >
>>>> >
>>>> >
>>>> ><autoCommit>
>>>> >
>>>> ><maxDocs>15000</maxDocs>  <!--     This tag was maxTime, before this -- >
>>>> >
>>>> ><openSearcher>false</openSearcher>
>>>> >
>>>> ></autoCommit>
>>>> >
>>>> >
>>>> >
>>>> >Using our customized schema.xml file with two or more <copyfield> tags,
>>>> the
>>>> >OOM exception is always thrown.  Based on the errors, the problem occurs
>>>> >when the process was trying to do the merge.  The error is provided
>>>> below:
>>>> >
>>>> >
>>>> >
>>>> >Exception in thread "Lucene Merge Thread #156"
>>>> >org.apache.lucene.index.MergePolicy$MergeException:
>>>> >java.lang.OutOfMemoryError: Java heap space
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
>>>> >
>>>> >Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)
>>>> >
>>>> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log
>>>> >
>>>> >SEVERE: auto commit error...:java.lang.IllegalStateException: this writer
>>>> >hit an OutOfMemoryError; cannot commit
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
>>>> >
>>>> >                at
>>>> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
>>>> >
>>>> >                at
>>>>
>>>> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
>>>> >
>>>> >                at
>>>> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>>>> >
>>>> >                at
>>>> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> >
>>>> >                at
>>>> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> >
>>>> >                at
>>>> java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> >
>>>> >                at
>>>>
>>>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>> >
>>>> >                at
>>>>
>>>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>> >
>>>> >                at
>>>>
>>>> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> >
>>>> >                at
>>>>
>>>> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> >
>>>> >                at java.lang.Thread.run(Thread.java:722)
>>>> >
>>>> >
>>>> >
>>>> >We think but are not 100% sure that the problem is related to the merge.
>>>> >
>>>> >
>>>> >
>>>> >Normally our schema.xml contains a lot of field specifications (like the
>>>> >ones seen in the file fragment below):
>>>> >
>>>> >
>>>> >
>>>> ><copyField source="ADDRESS.RECORD_ID_abc"
>>>> dest="ADDRESS.RECORD_ID.case_abc"
>>>> >/>
>>>> >
>>>> ><copyField source="ADDRESS.RECORD_ID_abc"
>>>> >dest="ADDRESS.RECORD_ID.case.soundex_abc" />
>>>> >
>>>> ><copyField source="ADDRESS.RECORD_ID_abc"
>>>> >dest="ADDRESS.RECORD_ID.case_nvl_abc" />
>>>> >
>>>> >
>>>> >
>>>> >In tests using the default file schema.xml and no <copyfield> tags,
>>>> >indexing completed successfully.  6 million records produced a 900 MB
>>>> data
>>>> >directory.
>>>> >
>>>> >
>>>> >
>>>> >When I included just one <copyfield> tag, indexing completed
>>>> successfully.  6
>>>> >million records produced a 990 MB data directory (90 MB bigger).
>>>> >
>>>> >
>>>> >
>>>> >When I included just two <copyfield> tags, the index crashed with an OOM
>>>> >exception.
>>>> >
>>>> >
>>>> >
>>>> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed
>>>> the
>>>> >crash.
>>>> >
>>>> >
>>>> >
>>>> >The net of our test results I as follows:
>>>> >
>>>> >
>>>> >
>>>> >*solrconfig.xml*
>>>> >
>>>> >*schema.xml*
>>>> >
>>>> >*result*
>>>> >
>>>> >
>>>> >default plus only jar references
>>>> >
>>>> >default (no copyfield tags)
>>>> >
>>>> >success
>>>> >
>>>> >default plus only jar references
>>>> >
>>>> >modified with one copyfield tag
>>>> >
>>>> >success
>>>> >
>>>> >default plus only jar references
>>>> >
>>>> >modified with two copyfield tags
>>>> >
>>>> >crash
>>>> >
>>>> >additional modified settings
>>>> >
>>>> >default (no copyfield tags)
>>>> >
>>>> >success
>>>> >
>>>> >additional modified settings
>>>> >
>>>> >modified with one copyfield tag
>>>> >
>>>> >success
>>>> >
>>>> >additional modified settings
>>>> >
>>>> >modified with two copyfield tags
>>>> >
>>>> >crash
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >Our question is, what can we do to eliminate these OOM exceptions?
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>>
>

Re: Full Indexing is Causing a Java Heap Out of Memory Exception

Reply via email to