Hi, Can you remove auto commit for bulk import. Commit at the very end?
Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo <candygram.for.mo...@gmail.com> wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: <dataConfig> <dataSource name="org_only" type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL" user="admin" password="admin" readOnly="false" batchSize="100" /> <document> <entity name="full-index" query=" select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL " > <field column="SOLR_ID" name="id" /> <field column="SOLR_CATEGORY" name="category" /> <field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" /> <field column="ADDRESSALLADDRTYPECD" name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" /> <field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc" /> <field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" /> <field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" /> <field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" /> <field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" /> <field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc" /> </entity> <!-- Varaibles --> <!-- '${dataimporter.last_index_time}' --> </document> </dataConfig> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo < candygram.for.mo...@gmail.com> wrote: > In this case we are indexing an Oracle database. > > We do not include the data-config.xml in our distribution. We store the > database information in the database.xml file. I have attached the > database.xml file. > > When we use the default merge policy settings, we get the same results. > > > > We have not tried to dump the table to a comma separated file. We think > that dumping this size table to disk will introduce other memory problems > with big file management. We have not tested that case. > > > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> wrote: > >> Hi, >> >> Which database are you using? Can you send us data-config.xml? >> >> What happens when you use default merge policy settings? >> >> What happens when you dump your table to Comma Separated File and fed >> that file to solr? >> >> Ahmet >> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> >> The ramBufferSizeMB was set to 6MB only on the test system to make the >> system crash sooner. In production that tag is commented out which >> I believe forces the default value to be used. >> >> >> >> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >> >> Hi, >> > >> >out of curiosity, why did you set ramBufferSizeMB to 6? >> > >> >Ahmet >> > >> > >> > >> > >> > >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception >> > >> >*SOLR/Lucene version: *4.2.1* >> > >> > >> >*JVM version: >> > >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11) >> > >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) >> > >> > >> > >> >*Indexer startup command: >> > >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m >> > >> > >> > >> >java " %JVMARGS% ^ >> > >> >-Dcom.sun.management.jmxremote.port=1092 ^ >> > >> >-Dcom.sun.management.jmxremote.ssl=false ^ >> > >> >-Dcom.sun.management.jmxremote.authenticate=false ^ >> > >> >-jar start.jar >> > >> > >> > >> >*SOLR indexing HTTP parameters request: >> > >> >webapp=/solr path=/dataimport >> >params={clean=false&command=full-import&wt=javabin&version=2} >> > >> > >> > >> >We are getting a Java heap OOM exception when indexing (updating) 27 >> >million records. If we increase the Java heap memory settings the >> problem >> >goes away but we believe the problem has not been fixed and that we will >> >eventually get the same OOM exception. We have other processes on the >> >server that also require resources so we cannot continually increase the >> >memory settings to resolve the OOM issue. We are trying to find a way to >> >configure the SOLR instance to reduce or preferably eliminate the >> >possibility of an OOM exception. >> > >> > >> > >> >We can reproduce the problem on a test machine. We set the Java heap >> >memory size to 64MB to accelerate the exception. If we increase this >> >setting the same problems occurs, just hours later. In the test >> >environment, we are using the following parameters: >> > >> > >> > >> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m >> > >> > >> > >> >Normally we use the default solrconfig.xml file with only the following >> jar >> >file references added: >> > >> > >> > >> ><lib path="../../../../default/lib/common.jar" /> >> > >> ><lib path="../../../../default/lib/webapp.jar" /> >> > >> ><lib path="../../../../default/lib/commons-pool-1.4.jar" /> >> > >> > >> > >> >Using these values and trying to index 6 million records from the >> database, >> >the Java Heap Out of Memory exception is thrown very quickly. >> > >> > >> > >> >We were able to complete a successful indexing by further modifying the >> >solrconfig.xml and removing all or all but one <copyfield> tags from the >> >schema.xml file. >> > >> > >> > >> >The following solrconfig.xml values were modified: >> > >> > >> > >> ><ramBufferSizeMB>6</ramBufferSizeMB> >> > >> > >> > >> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> >> > >> ><int name="maxMergeAtOnce">2</int> >> > >> ><int name="maxMergeAtOnceExplicit">2</int> >> > >> ><int name="segmentsPerTier">10</int> >> > >> ><int name="maxMergedSegmentMB">150</int> >> > >> ></mergePolicy> >> > >> > >> > >> ><autoCommit> >> > >> ><maxDocs>15000</maxDocs> <!-- This tag was maxTime, before this -- > >> > >> ><openSearcher>false</openSearcher> >> > >> ></autoCommit> >> > >> > >> > >> >Using our customized schema.xml file with two or more <copyfield> tags, >> the >> >OOM exception is always thrown. Based on the errors, the problem occurs >> >when the process was trying to do the merge. The error is provided >> below: >> > >> > >> > >> >Exception in thread "Lucene Merge Thread #156" >> >org.apache.lucene.index.MergePolicy$MergeException: >> >java.lang.OutOfMemoryError: Java heap space >> > >> > at >> >> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) >> > >> > at >> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) >> > >> >Caused by: java.lang.OutOfMemoryError: Java heap space >> > >> > at >> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) >> > >> > at >> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) >> > >> > at >> >> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) >> > >> > at >> >> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) >> > >> > at >> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) >> > >> > at >> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) >> > >> > at >> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) >> > >> > at >> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) >> > >> > at >> >> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) >> > >> > at >> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) >> > >> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log >> > >> >SEVERE: auto commit error...:java.lang.IllegalStateException: this writer >> >hit an OutOfMemoryError; cannot commit >> > >> > at >> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971) >> > >> > at >> >> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744) >> > >> > at >> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) >> > >> > at >> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) >> > >> > at >> >> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) >> > >> > at >> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) >> > >> > at >> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> > >> > at >> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> > >> > at >> java.util.concurrent.FutureTask.run(FutureTask.java:166) >> > >> > at >> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> > >> > at >> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> > >> > at >> >> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> > >> > at >> >> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> > >> > at java.lang.Thread.run(Thread.java:722) >> > >> > >> > >> >We think but are not 100% sure that the problem is related to the merge. >> > >> > >> > >> >Normally our schema.xml contains a lot of field specifications (like the >> >ones seen in the file fragment below): >> > >> > >> > >> ><copyField source="ADDRESS.RECORD_ID_abc" >> dest="ADDRESS.RECORD_ID.case_abc" >> >/> >> > >> ><copyField source="ADDRESS.RECORD_ID_abc" >> >dest="ADDRESS.RECORD_ID.case.soundex_abc" /> >> > >> ><copyField source="ADDRESS.RECORD_ID_abc" >> >dest="ADDRESS.RECORD_ID.case_nvl_abc" /> >> > >> > >> > >> >In tests using the default file schema.xml and no <copyfield> tags, >> >indexing completed successfully. 6 million records produced a 900 MB >> data >> >directory. >> > >> > >> > >> >When I included just one <copyfield> tag, indexing completed >> successfully. 6 >> >million records produced a 990 MB data directory (90 MB bigger). >> > >> > >> > >> >When I included just two <copyfield> tags, the index crashed with an OOM >> >exception. >> > >> > >> > >> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed >> the >> >crash. >> > >> > >> > >> >The net of our test results I as follows: >> > >> > >> > >> >*solrconfig.xml* >> > >> >*schema.xml* >> > >> >*result* >> > >> > >> >default plus only jar references >> > >> >default (no copyfield tags) >> > >> >success >> > >> >default plus only jar references >> > >> >modified with one copyfield tag >> > >> >success >> > >> >default plus only jar references >> > >> >modified with two copyfield tags >> > >> >crash >> > >> >additional modified settings >> > >> >default (no copyfield tags) >> > >> >success >> > >> >additional modified settings >> > >> >modified with one copyfield tag >> > >> >success >> > >> >additional modified settings >> > >> >modified with two copyfield tags >> > >> >crash >> > >> > >> > >> > >> > >> >Our question is, what can we do to eliminate these OOM exceptions? >> > >> > >> > >