I might have forgot to mention that we are using the DataImportHandler. I think we know how to remove auto commit. How would we force a commit at the end?
On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo < candygram.for.mo...@gmail.com> wrote: > We would be happy to try that. That sounds counter intuitive for the high > volume of records we have. Can you help me understand how that might solve > our problem? > > > > On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > >> Hi, >> >> Can you remove auto commit for bulk import. Commit at the very end? >> >> Ahmet >> >> >> >> On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> In case the attached database.xml file didn't show up, I have pasted in >> the >> contents below: >> >> <dataConfig> >> <dataSource >> name="org_only" >> type="JdbcDataSource" >> driver="oracle.jdbc.OracleDriver" >> url="jdbc:oracle:thin:@test2.abc.com:1521:ORCL" >> user="admin" >> password="admin" >> readOnly="false" >> batchSize="100" >> /> >> <document> >> >> >> <entity name="full-index" query=" >> select >> >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') >> as SOLR_ID, >> >> 'ORCL.ADDRESS_ACCT_ALL' >> as SOLR_CATEGORY, >> >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as >> ADDRESSALLROWID, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as >> ADDRESSALLADDRTYPECD, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as >> ADDRESSALLLONGITUDE, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as >> ADDRESSALLLATITUDE, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as >> ADDRESSALLADDRNAME, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as >> ADDRESSALLCITY, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as >> ADDRESSALLSTATE, >> NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as >> ADDRESSALLEMAILADDR >> >> from ORCL.ADDRESS_ACCT_ALL >> " > >> >> <field column="SOLR_ID" name="id" /> >> <field column="SOLR_CATEGORY" name="category" /> >> <field column="ADDRESSALLROWID" name="ADDRESS_ACCT_ALL.RECORD_ID_abc" /> >> <field column="ADDRESSALLADDRTYPECD" >> name="ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc" /> >> <field column="ADDRESSALLLONGITUDE" name="ADDRESS_ACCT_ALL.LONGITUDE_abc" >> /> >> <field column="ADDRESSALLLATITUDE" name="ADDRESS_ACCT_ALL.LATITUDE_abc" /> >> <field column="ADDRESSALLADDRNAME" name="ADDRESS_ACCT_ALL.ADDR_NAME_abc" >> /> >> <field column="ADDRESSALLCITY" name="ADDRESS_ACCT_ALL.CITY_abc" /> >> <field column="ADDRESSALLSTATE" name="ADDRESS_ACCT_ALL.STATE_abc" /> >> <field column="ADDRESSALLEMAILADDR" name="ADDRESS_ACCT_ALL.EMAIL_ADDR_abc" >> /> >> >> </entity> >> >> >> >> <!-- Varaibles --> >> <!-- '${dataimporter.last_index_time}' --> >> </document> >> </dataConfig> >> >> >> >> >> >> >> On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo < >> candygram.for.mo...@gmail.com> wrote: >> >> > In this case we are indexing an Oracle database. >> > >> > We do not include the data-config.xml in our distribution. We store the >> > database information in the database.xml file. I have attached the >> > database.xml file. >> > >> > When we use the default merge policy settings, we get the same results. >> > >> > >> > >> > We have not tried to dump the table to a comma separated file. We think >> > that dumping this size table to disk will introduce other memory >> problems >> > with big file management. We have not tested that case. >> > >> > >> > On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan <iori...@yahoo.com> wrote: >> > >> >> Hi, >> >> >> >> Which database are you using? Can you send us data-config.xml? >> >> >> >> What happens when you use default merge policy settings? >> >> >> >> What happens when you dump your table to Comma Separated File and fed >> >> that file to solr? >> >> >> >> Ahmet >> >> >> >> On Friday, April 4, 2014 5:10 PM, Candygram For Mongo < >> >> candygram.for.mo...@gmail.com> wrote: >> >> >> >> The ramBufferSizeMB was set to 6MB only on the test system to make the >> >> system crash sooner. In production that tag is commented out which >> >> I believe forces the default value to be used. >> >> >> >> >> >> >> >> >> >> On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan <iori...@yahoo.com> >> wrote: >> >> >> >> Hi, >> >> > >> >> >out of curiosity, why did you set ramBufferSizeMB to 6? >> >> > >> >> >Ahmet >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >On Friday, April 4, 2014 3:27 AM, Candygram For Mongo < >> >> candygram.for.mo...@gmail.com> wrote: >> >> >*Main issue: Full Indexing is Causing a Java Heap Out of Memory >> Exception >> >> > >> >> >*SOLR/Lucene version: *4.2.1* >> >> > >> >> > >> >> >*JVM version: >> >> > >> >> >Java(TM) SE Runtime Environment (build 1.7.0_07-b11) >> >> > >> >> >Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) >> >> > >> >> > >> >> > >> >> >*Indexer startup command: >> >> > >> >> >set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m >> >> > >> >> > >> >> > >> >> >java " %JVMARGS% ^ >> >> > >> >> >-Dcom.sun.management.jmxremote.port=1092 ^ >> >> > >> >> >-Dcom.sun.management.jmxremote.ssl=false ^ >> >> > >> >> >-Dcom.sun.management.jmxremote.authenticate=false ^ >> >> > >> >> >-jar start.jar >> >> > >> >> > >> >> > >> >> >*SOLR indexing HTTP parameters request: >> >> > >> >> >webapp=/solr path=/dataimport >> >> >params={clean=false&command=full-import&wt=javabin&version=2} >> >> > >> >> > >> >> > >> >> >We are getting a Java heap OOM exception when indexing (updating) 27 >> >> >million records. If we increase the Java heap memory settings the >> >> problem >> >> >goes away but we believe the problem has not been fixed and that we >> will >> >> >eventually get the same OOM exception. We have other processes on the >> >> >server that also require resources so we cannot continually increase >> the >> >> >memory settings to resolve the OOM issue. We are trying to find a >> way to >> >> >configure the SOLR instance to reduce or preferably eliminate the >> >> >possibility of an OOM exception. >> >> > >> >> > >> >> > >> >> >We can reproduce the problem on a test machine. We set the Java heap >> >> >memory size to 64MB to accelerate the exception. If we increase this >> >> >setting the same problems occurs, just hours later. In the test >> >> >environment, we are using the following parameters: >> >> > >> >> > >> >> > >> >> >JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m >> >> > >> >> > >> >> > >> >> >Normally we use the default solrconfig.xml file with only the >> following >> >> jar >> >> >file references added: >> >> > >> >> > >> >> > >> >> ><lib path="../../../../default/lib/common.jar" /> >> >> > >> >> ><lib path="../../../../default/lib/webapp.jar" /> >> >> > >> >> ><lib path="../../../../default/lib/commons-pool-1.4.jar" /> >> >> > >> >> > >> >> > >> >> >Using these values and trying to index 6 million records from the >> >> database, >> >> >the Java Heap Out of Memory exception is thrown very quickly. >> >> > >> >> > >> >> > >> >> >We were able to complete a successful indexing by further modifying >> the >> >> >solrconfig.xml and removing all or all but one <copyfield> tags from >> the >> >> >schema.xml file. >> >> > >> >> > >> >> > >> >> >The following solrconfig.xml values were modified: >> >> > >> >> > >> >> > >> >> ><ramBufferSizeMB>6</ramBufferSizeMB> >> >> > >> >> > >> >> > >> >> ><mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> >> >> > >> >> ><int name="maxMergeAtOnce">2</int> >> >> > >> >> ><int name="maxMergeAtOnceExplicit">2</int> >> >> > >> >> ><int name="segmentsPerTier">10</int> >> >> > >> >> ><int name="maxMergedSegmentMB">150</int> >> >> > >> >> ></mergePolicy> >> >> > >> >> > >> >> > >> >> ><autoCommit> >> >> > >> >> ><maxDocs>15000</maxDocs> <!-- This tag was maxTime, before this >> -- > >> >> > >> >> ><openSearcher>false</openSearcher> >> >> > >> >> ></autoCommit> >> >> > >> >> > >> >> > >> >> >Using our customized schema.xml file with two or more <copyfield> >> tags, >> >> the >> >> >OOM exception is always thrown. Based on the errors, the problem >> occurs >> >> >when the process was trying to do the merge. The error is provided >> >> below: >> >> > >> >> > >> >> > >> >> >Exception in thread "Lucene Merge Thread #156" >> >> >org.apache.lucene.index.MergePolicy$MergeException: >> >> >java.lang.OutOfMemoryError: Java heap space >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) >> >> > >> >> >Caused by: java.lang.OutOfMemoryError: Java heap space >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) >> >> > >> >> > at >> >> >> >org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) >> >> > >> >> > at >> >> >org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) >> >> > >> >> > at >> >> >org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) >> >> > >> >> > at >> >> >org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) >> >> > >> >> >Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log >> >> > >> >> >SEVERE: auto commit error...:java.lang.IllegalStateException: this >> writer >> >> >hit an OutOfMemoryError; cannot commit >> >> > >> >> > at >> >> >org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971) >> >> > >> >> > at >> >> >> >> >> >org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2744) >> >> > >> >> > at >> >> >> >org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) >> >> > >> >> > at >> >> >org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) >> >> > >> >> > at >> >> >> >> >> >org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) >> >> > >> >> > at >> >> >org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) >> >> > >> >> > at >> >> >> >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> >> > >> >> > at >> >> >java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> >> > >> >> > at >> >> java.util.concurrent.FutureTask.run(FutureTask.java:166) >> >> > >> >> > at >> >> >> >> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> >> > >> >> > at >> >> >> >> >> >java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> >> > >> >> > at >> >> >> >> >> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> >> > >> >> > at >> >> >> >> >> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> >> > >> >> > at java.lang.Thread.run(Thread.java:722) >> >> > >> >> > >> >> > >> >> >We think but are not 100% sure that the problem is related to the >> merge. >> >> > >> >> > >> >> > >> >> >Normally our schema.xml contains a lot of field specifications (like >> the >> >> >ones seen in the file fragment below): >> >> > >> >> > >> >> > >> >> ><copyField source="ADDRESS.RECORD_ID_abc" >> >> dest="ADDRESS.RECORD_ID.case_abc" >> >> >/> >> >> > >> >> ><copyField source="ADDRESS.RECORD_ID_abc" >> >> >dest="ADDRESS.RECORD_ID.case.soundex_abc" /> >> >> > >> >> ><copyField source="ADDRESS.RECORD_ID_abc" >> >> >dest="ADDRESS.RECORD_ID.case_nvl_abc" /> >> >> > >> >> > >> >> > >> >> >In tests using the default file schema.xml and no <copyfield> tags, >> >> >indexing completed successfully. 6 million records produced a 900 MB >> >> data >> >> >directory. >> >> > >> >> > >> >> > >> >> >When I included just one <copyfield> tag, indexing completed >> >> successfully. 6 >> >> >million records produced a 990 MB data directory (90 MB bigger). >> >> > >> >> > >> >> > >> >> >When I included just two <copyfield> tags, the index crashed with an >> OOM >> >> >exception. >> >> > >> >> > >> >> > >> >> >Changing parameters like maxMergedSegmentMB or maxDocs, only postponed >> >> the >> >> >crash. >> >> > >> >> > >> >> > >> >> >The net of our test results I as follows: >> >> > >> >> > >> >> > >> >> >*solrconfig.xml* >> >> > >> >> >*schema.xml* >> >> > >> >> >*result* >> >> > >> >> > >> >> >default plus only jar references >> >> > >> >> >default (no copyfield tags) >> >> > >> >> >success >> >> > >> >> >default plus only jar references >> >> > >> >> >modified with one copyfield tag >> >> > >> >> >success >> >> > >> >> >default plus only jar references >> >> > >> >> >modified with two copyfield tags >> >> > >> >> >crash >> >> > >> >> >additional modified settings >> >> > >> >> >default (no copyfield tags) >> >> > >> >> >success >> >> > >> >> >additional modified settings >> >> > >> >> >modified with one copyfield tag >> >> > >> >> >success >> >> > >> >> >additional modified settings >> >> > >> >> >modified with two copyfield tags >> >> > >> >> >crash >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >Our question is, what can we do to eliminate these OOM exceptions? >> >> > >> >> > >> >> >> > >> > >> >> >