On 4/25/2013 9:00 AM, xiaoqi wrote: > i using DIH to build index is slow , when it fetch 2 million rows , it will > spend 20 minutes , very slow.
If it takes 20 minutes for two million records, I'd say it's working very well. I do six simultaneous MySQL imports of 13 million records each. It takes a little over 3 hours on Solr 3.5.0, a little over four hours on Solr 4.2.1 (due to compression and the transaction log). If I do them one at a time instead of all at once, it will go *slightly* faster for each one, but the overall process would take a whole day. For comparison purposes, that's about 20 minutes each time it does 1 million rows. Yours is going twice as fast as mine. Looking at your config file, I don't see a batchSize parameter. This is a change that is specific to MySQL. You can greatly reduce the memory usage by including this attribute in the dataSource tag along with the user and password: batchSize="-1" With two million records and no batchSize parameter, I'm surprised you aren't hitting an Out Of Memory error. By default JDBC will pull down all the results and store them in memory, then DIH will begin indexing. A batchSize of -1 makes DIH tell the MySQL JDBC driver to stream the results instead of storing them. Reducing the memory usage in this way might make it go faster. Thanks, Shawn