On 4/18/2014 6:15 PM, Candygram For Mongo wrote:
> We are getting Out Of Memory errors when we try to execute a full import
> using the Data Import Handler.  This error originally occurred on a
> production environment with a database containing 27 million records.  Heap
> memory was configured for 6GB and the server had 32GB of physical memory.
>  We have been able to replicate the error on a local system with 6 million
> records.  We set the memory heap size to 64MB to accelerate the error
> replication.  The indexing process has been failing in different scenarios.
>  We have 9 test cases documented.  In some of the test cases we increased
> the heap size to 128MB.  In our first test case we set heap memory to 512MB
> which also failed.

One characteristic of a JDBC connection is that unless you tell it
otherwise, it will try to retrieve the entire resultset into RAM before
any results are delivered to the application.  It's not Solr doing this,
it's JDBC.

In this case, there are 27 million rows in the resultset.  It's highly
unlikely that this much data (along with the rest of Solr's memory
requirements) will fit in 6GB of heap.

JDBC has a built-in way to deal with this.  It's called fetchSize.  By
using the batchSize parameter on your JdbcDataSource config, you can set
the JDBC fetchSize.  Set it to something small, between 100 and 1000,
and you'll probably get rid of the OOM problem.

http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource

If you had been using MySQL, I would have recommended that you set
batchSize to -1.  This sets fetchSize to Integer.MIN_VALUE, which tells
the MySQL driver to stream results instead of trying to either batch
them or return everything.  I'm pretty sure that the Oracle driver
doesn't work this way -- you would have to modify the dataimport source
code to use their streaming method.

Thanks,
Shawn

Reply via email to