Hi, Now we have a more informative error : org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) 1) Does this happen when you increase -Xmx64m -Xms64m ? 2) I see you use custom jars called "MDSolrDIHTransformer JARs inside" But I don't see any Transformers used in database.xm, why is that. I would remove them just to be sure. 3) I see you have org.apache.solr.core.StandardDirectoryFactory declared in sorlconfig. Assuming you are using, 64 bit windows, it is recommended to use MMap http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html 4) In your previous mail you had batch size set, now there is not batchSize defined in database.xml. For MySQL it is recommended to use -1. Not sure about oracle, I personally used 10,000 once for Oracle. http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F You have a lot of copyFields defined. There could be some gotchas when handling unusually much copy fields. I would really try CSV option here. Given that you have only full import SQL defined and it is not a complex one. It queries only one table. I believe Oracle has some tool to export a table to CSV file efficiently. On Saturday, April 5, 2014 3:05 AM, Candygram For Mongo <candygram.for.mo...@gmail.com> wrote: Does this user list allow attachments? I have four files attached (database.xml, error.txt, schema.xml, solrconfig.xml). We just ran the process again using the parameters you suggested, but not to a csv file. It errored out quickly. We are working on the csv file run. Removed both <autoCommit> and <autoSoftCommit> parts/definitions from solrconfig.xml Disabled tlog by removing <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> from solrconfig.xml Used commit=true parameter. ?commit=true&command=full-import On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan <iori...@yahoo.com> wrote: Hi, > >This may not solve your problem but generally it is recommended to disable >auto commit and transaction logs for bulk indexing. >And issue one commit at the very end. Do you tlogs enabled? I see "commit >failed" in the error message thats why I am offering this. > >And regarding comma separated values, with this approach you focus on just >solr importing process. You separate data acquisition phrase. And it is very >fast load even big csv files http://wiki.apache.org/solr/UpdateCSV >I have never experienced OOM during indexing, I suspect data acquisition has >role in it. > >Ahmet > > >On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo ><candygram.for.mo...@gmail.com> wrote: > >We would be happy to try that. That sounds counter intuitive for the high >volume of records we have. Can you help me understand how that might solve >our problem? > > > > >On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > >Hi, >> >>Can you remove auto commit for bulk import. Commit at the very end? >> >>Ahmet >> >> >> >> >>On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo >><candygram.for.mo...@gmail.com> wrote: >>In case the attached database.xml file didn't show up, I have pasted in the >>contents below: >> >><dataConfig> >><dataSource >>name="org_only" >>type="JdbcDataSource" >>driver="oracle.jdbc.OracleDriver" >>url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL" >>user="admin" >>password="admin" >>readOnly="false" >>batchSize="100" >>/> >><document> >> >> >><entity name="full-index" query=" >>select >> >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') >>as SOLR_ID, >> >>'ORCL.ADDRESS_ACCT_ALL' >>as SOLR_CATEGORY, >> >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as >>ADDRESSALLROWID, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as >>ADDRESSALLADDRTYPECD, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as >>ADDRESSALLLONGITUDE, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as >>ADDRESSALLLATITUDE, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as >>ADDRESSALLADDRNAME, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as >>ADDRESSALLCITY, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as >>ADDRESSALLSTATE, >>NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as >>ADDRESSALLEMAILADDR >> >>from ORCL.ADDRESS_ACCT_ALL >>" > >> >><field column="SOLR_ID" name="id" /> >><field column="SOLR_CATEGORY" name="category" /> >><field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" /> >><field column="ADDRESSALLADDRTYPECD" >>name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" /> >><field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc" /> >><field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" /> >><field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" /> >><field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" /> >><field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" /> >><field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc" >>/> >> >></entity> >> >> >> >><!-- Varaibles --> >><!-- '${dataimporter.last_index_time}' --> >></document> >></dataConfig> >> >> >> >> >> >> >>On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo < >>candygram.for.mo...@gmail.com> wrote: >> >>> In this case we are indexing an Oracle database. >>> >>> We do not include the data-config.xml in our distribution. We store the >>> database information in the database.xml file. I have attached the >>> database.xml file. >>> >>> When we use the default merge policy settings, we get the same results. >>> >>> >>> >>> We have not tried to dump the table to a comma separated file. We think >>> that dumping this size table to disk will introduce other memory problems >>> with big file management. We have not tested that case. >>> >>> >>> On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> wrote: >>> >>>> Hi, >>>> >>>> Which database are you using? Can you send us data-config.xml? >>>> >>>> What happens when you use default merge policy settings? >>>> >>>> What happens when you dump your table to Comma Separated File and fed >>>> that file to solr? >>>> >>>> Ahmet >>>> >>>> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo < >>>> candygram.for.mo...@gmail.com> wrote: >>>> >>>> The ramBufferSizeMB was set to 6MB only on the test system to make the >>>> system crash sooner. In production that tag is commented out which >>>> I believe forces the default value to be used. >>>> >>>> >>>> >>>> >>>> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >>>> >>>> Hi, >>>> > >>>> >out of curiosity, why did you set ramBufferSizeMB to 6? >>>> > >>>> >Ahmet >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo < >>>> candygram.for.mo...@gmail.com> wrote: >>>> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception >>>> > >>>> >*SOLR/Lucene version: *4.2.1* >>>> > >>>> > >>>> >*JVM version: >>>> > >>>> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11) >>>> > >>>> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) >>>> > >>>> > >>>> > >>>> >*Indexer startup command: >>>> > >>>> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m >>>> > >>>> > >>>> > >>>> >java " %JVMARGS% ^ >>>> > >>>> >-Dcom.sun.management.jmxremote.port=1092 ^ >>>> > >>>> >-Dcom.sun.management.jmxremote.ssl=false ^ >>>> > >>>> >-Dcom.sun.management.jmxremote.authenticate=false ^ >>>> > >>>> >-jar start.jar >>>> > >>>> > >>>> > >>>> >*SOLR indexing HTTP parameters request: >>>> > >>>> >webapp=/solr path=/dataimport >>>> >params={clean=false&command=full-import&wt=javabin&version=2} >>>> > >>>> > >>>> > >>>> >We are getting a Java heap OOM exception when indexing (updating) 27 >>>> >million records. If we increase the Java heap memory settings the >>>> problem >>>> >goes away but we believe the problem has not been fixed and that we will >>>> >eventually get the same OOM exception. We have other processes on the >>>> >server that also require resources so we cannot continually increase the >>>> >memory settings to resolve the OOM issue. We are trying to find a way to >>>> >configure the SOLR instance to reduce or preferably eliminate the >>>> >possibility of an OOM exception. >>>> > >>>> > >>>> > >>>> >We can reproduce the problem on a test machine. We set the Java heap >>>> >memory size to 64MB to accelerate the exception. If we increase this >>>> >setting the same problems occurs, just hours later. In the test >>>> >environment, we are using the following parameters: >>>> > >>>> > >>>> > >>>> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m >>>> > >>>> > >>>> > >>>> >Normally we use the default solrconfig.xml file with only the following >>>> jar >>>> >file references added: >>>> > >>>> > >>>> > >>>> ><lib path="../../../../default/lib/common.jar" /> >>>> > >>>> ><lib path="../../../../default/lib/webapp.jar" /> >>>> > >>>> ><lib path="../../../../default/lib/commons-pool-1.4.jar" /> >>>> > >>>> > >>>> > >>>> >Using these values and trying to index 6 million records from the >>>> database, >>>> >the Java Heap Out of Memory exception is thrown very quickly. >>>> > >>>> > >>>> > >>>> >We were able to complete a successful indexing by further modifying the >>>> >solrconfig.xml and removing all or all but one <copyfield> tags from the >>>> >schema.xml file. >>>> > >>>> > >>>> > >>>> >The following solrconfig.xml values were modified: >>>> > >>>> > >>>> > >>>> ><ramBufferSizeMB>6</ramBufferSizeMB> >>>> > >>>> > >>>> > >>>> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> >>>> > >>>> ><int name="maxMergeAtOnce">2</int> >>>> > >>>> ><int name="maxMergeAtOnceExplicit">2</int> >>>> > >>>> ><int name="segmentsPerTier">10</int> >>>> > >>>> ><int name="maxMergedSegmentMB">150</int> >>>> > >>>> ></mergePolicy> >>>> > >>>> > >>>> > >>>> ><autoCommit> >>>> > >>>> ><maxDocs>15000</maxDocs> <!-- This tag was maxTime, before this -- > >>>> > >>>> ><openSearcher>false</openSearcher> >>>> > >>>> ></autoCommit> >>>> > >>>> > >>>> > >>>> >Using our customized schema.xml file with two or more <copyfield> tags, >>>> the >>>> >OOM exception is always thrown. Based on the errors, the problem occurs >>>> >when the process was trying to do the merge. The error is provided >>>> below: >>>> > >>>> > >>>> > >>>> >Exception in thread "Lucene Merge Thread #156" >>>> >org.apache.lucene.index.MergePolicy$MergeException: >>>> >java.lang.OutOfMemoryError: Java heap space >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) >>>> > >>>> >Caused by: java.lang.OutOfMemoryError: Java heap space >>>> > >>>> > at >>>> >>>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) >>>> > >>>> > at >>>> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) >>>> > >>>> > at >>>> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) >>>> > >>>> > at >>>> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) >>>> > >>>> > at >>>> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) >>>> > >>>> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log >>>> > >>>> >SEVERE: auto commit error...:java.lang.IllegalStateException: this writer >>>> >hit an OutOfMemoryError; cannot commit >>>> > >>>> > at >>>> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971) >>>> > >>>> > at >>>> >>>> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744) >>>> > >>>> > at >>>> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) >>>> > >>>> > at >>>> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) >>>> > >>>> > at >>>> >>>> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) >>>> > >>>> > at >>>> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) >>>> > >>>> > at >>>> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> > >>>> > at >>>> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>> > >>>> > at >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) >>>> > >>>> > at >>>> >>>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>> > >>>> > at >>>> >>>> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>> > >>>> > at >>>> >>>> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>> > >>>> > at >>>> >>>> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>> > >>>> > at java.lang.Thread.run(Thread.java:722) >>>> > >>>> > >>>> > >>>> >We think but are not 100% sure that the problem is related to the merge. >>>> > >>>> > >>>> > >>>> >Normally our schema.xml contains a lot of field specifications (like the >>>> >ones seen in the file fragment below): >>>> > >>>> > >>>> > >>>> ><copyField source="ADDRESS.RECORD_ID_abc" >>>> dest="ADDRESS.RECORD_ID.case_abc" >>>> >/> >>>> > >>>> ><copyField source="ADDRESS.RECORD_ID_abc" >>>> >dest="ADDRESS.RECORD_ID.case.soundex_abc" /> >>>> > >>>> ><copyField source="ADDRESS.RECORD_ID_abc" >>>> >dest="ADDRESS.RECORD_ID.case_nvl_abc" /> >>>> > >>>> > >>>> > >>>> >In tests using the default file schema.xml and no <copyfield> tags, >>>> >indexing completed successfully. 6 million records produced a 900 MB >>>> data >>>> >directory. >>>> > >>>> > >>>> > >>>> >When I included just one <copyfield> tag, indexing completed >>>> successfully. 6 >>>> >million records produced a 990 MB data directory (90 MB bigger). >>>> > >>>> > >>>> > >>>> >When I included just two <copyfield> tags, the index crashed with an OOM >>>> >exception. >>>> > >>>> > >>>> > >>>> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed >>>> the >>>> >crash. >>>> > >>>> > >>>> > >>>> >The net of our test results I as follows: >>>> > >>>> > >>>> > >>>> >*solrconfig.xml* >>>> > >>>> >*schema.xml* >>>> > >>>> >*result* >>>> > >>>> > >>>> >default plus only jar references >>>> > >>>> >default (no copyfield tags) >>>> > >>>> >success >>>> > >>>> >default plus only jar references >>>> > >>>> >modified with one copyfield tag >>>> > >>>> >success >>>> > >>>> >default plus only jar references >>>> > >>>> >modified with two copyfield tags >>>> > >>>> >crash >>>> > >>>> >additional modified settings >>>> > >>>> >default (no copyfield tags) >>>> > >>>> >success >>>> > >>>> >additional modified settings >>>> > >>>> >modified with one copyfield tag >>>> > >>>> >success >>>> > >>>> >additional modified settings >>>> > >>>> >modified with two copyfield tags >>>> > >>>> >crash >>>> > >>>> > >>>> > >>>> > >>>> > >>>> >Our question is, what can we do to eliminate these OOM exceptions? >>>> > >>>> > >>>> >>> >>> >> >> >