I noticed enormous number of commits, which reasonably triggers merges that hits OOMe. Try to disable autocommits completely. Monitor commit occurrences in the log.
On Sun, Apr 20, 2014 at 9:12 PM, Candygram For Mongo < candygram.for.mo...@gmail.com> wrote: > We have tried using fetchSize and we still got the same out of memory > errors. > > > On Fri, Apr 18, 2014 at 9:39 PM, Shawn Heisey <s...@elyograg.org> wrote: > > > On 4/18/2014 6:15 PM, Candygram For Mongo wrote: > > > We are getting Out Of Memory errors when we try to execute a full > import > > > using the Data Import Handler. This error originally occurred on a > > > production environment with a database containing 27 million records. > > Heap > > > memory was configured for 6GB and the server had 32GB of physical > memory. > > > We have been able to replicate the error on a local system with 6 > > million > > > records. We set the memory heap size to 64MB to accelerate the error > > > replication. The indexing process has been failing in different > > scenarios. > > > We have 9 test cases documented. In some of the test cases we > increased > > > the heap size to 128MB. In our first test case we set heap memory to > > 512MB > > > which also failed. > > > > One characteristic of a JDBC connection is that unless you tell it > > otherwise, it will try to retrieve the entire resultset into RAM before > > any results are delivered to the application. It's not Solr doing this, > > it's JDBC. > > > > In this case, there are 27 million rows in the resultset. It's highly > > unlikely that this much data (along with the rest of Solr's memory > > requirements) will fit in 6GB of heap. > > > > JDBC has a built-in way to deal with this. It's called fetchSize. By > > using the batchSize parameter on your JdbcDataSource config, you can set > > the JDBC fetchSize. Set it to something small, between 100 and 1000, > > and you'll probably get rid of the OOM problem. > > > > http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource > > > > If you had been using MySQL, I would have recommended that you set > > batchSize to -1. This sets fetchSize to Integer.MIN_VALUE, which tells > > the MySQL driver to stream results instead of trying to either batch > > them or return everything. I'm pretty sure that the Oracle driver > > doesn't work this way -- you would have to modify the dataimport source > > code to use their streaming method. > > > > Thanks, > > Shawn > > > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>