On 3/17/2014 12:39 PM, solr2020 wrote:
previously we faced OOM when we try to index 1.2M records at the same time. Now we divided that into two chunks and indexing twice. So now we are not getting OOM but heap usage is more. So we are analyzing and trying to find the cause to make sure we shouldn't get OOM again.
How are you indexing? A previous message you sent to the mailing list indicates that your source is a DB table.
If that's true, can you share the dataSource section(s) from your dataimport handler configuration? You might be running into a situation where DIH is retrieving the entire dataset via JDBC.
For a MySQL JDBC driver, you can avoid this with a batchSize parameter set to -1. This causes the JDBC driver to stream the results from the server rather than read them into memory. Other JDBC drivers may need different settings.
http://mysolr.com/tips/dataimporthandler-runs-out-of-memory-on-large-table/ Thanks, Shawn