We have tried using fetchSize and we still got the same out of memory errors.
On Fri, Apr 18, 2014 at 9:39 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 4/18/2014 6:15 PM, Candygram For Mongo wrote: > > We are getting Out Of Memory errors when we try to execute a full import > > using the Data Import Handler. This error originally occurred on a > > production environment with a database containing 27 million records. > Heap > > memory was configured for 6GB and the server had 32GB of physical memory. > > We have been able to replicate the error on a local system with 6 > million > > records. We set the memory heap size to 64MB to accelerate the error > > replication. The indexing process has been failing in different > scenarios. > > We have 9 test cases documented. In some of the test cases we increased > > the heap size to 128MB. In our first test case we set heap memory to > 512MB > > which also failed. > > One characteristic of a JDBC connection is that unless you tell it > otherwise, it will try to retrieve the entire resultset into RAM before > any results are delivered to the application. It's not Solr doing this, > it's JDBC. > > In this case, there are 27 million rows in the resultset. It's highly > unlikely that this much data (along with the rest of Solr's memory > requirements) will fit in 6GB of heap. > > JDBC has a built-in way to deal with this. It's called fetchSize. By > using the batchSize parameter on your JdbcDataSource config, you can set > the JDBC fetchSize. Set it to something small, between 100 and 1000, > and you'll probably get rid of the OOM problem. > > http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource > > If you had been using MySQL, I would have recommended that you set > batchSize to -1. This sets fetchSize to Integer.MIN_VALUE, which tells > the MySQL driver to stream results instead of trying to either batch > them or return everything. I'm pretty sure that the Oracle driver > doesn't work this way -- you would have to modify the dataimport source > code to use their streaming method. > > Thanks, > Shawn > >