DIH streams rows one by one. set the fetchSize="-1" this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble
On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > Setting the batchSize to 10000 would mean that the Jdbc driver will keep > 10000 rows in memory *for each entity* which uses that data source (if > correctly implemented by the driver). Not sure how well the Sql Server > driver implements this. Also keep in mind that Solr also needs memory to > index documents. You can probably try setting the batch size to a lower > value. > > The regular memory tuning stuff should apply here too -- try disabling > autoCommit and turn-off autowarming and see if it helps. > > On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> I'm trying to load ~10 million records into Solr using the >> DataImportHandler. >> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as >> soon as I try loading more than about 5 million records. >> >> Here's my configuration: >> I'm connecting to a SQL Server database using the sqljdbc driver. I've >> given >> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to >> 10000. My SQL query is "select top XXX field1, ... from table1". I have >> about 40 fields in my Solr schema. >> >> I thought the DataImportHandler would stream data from the DB rather than >> loading it all into memory at once. Is that not the case? Any thoughts on >> how to get around this (aside from getting a machine with more memory)? >> >> -- >> View this message in context: >> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > -- --Noble Paul