On 10/14/2012 5:45 PM, Erick Erickson wrote:
About your second point. Try committing more often with openSearcher set to false. There's a bit here: http://wiki.apache.org/solr/SolrConfigXml<autoCommit> <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered --> <maxTime>15000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered --> <openSearcher>false</openSearcher> <!-- SOLR 4.0. Optionally don't open a searcher on hard commit. This is useful to minimize the size of transaction logs that keep track of uncommitted updates. --> </autoCommit> That should keep the size of the transaction log down to reasonable levels...
I have autocommit turned completely off -- both values set to zero. The DIH import from MySQL, over 12 million rows per shard, is done in one go on all my build cores at once, then I swap cores. It takes a little over three hours and produces a 22GB index. I have batchSize set to -1 so that jdbc streams the records.
When I first set this up back on 1.4.1, I had some kind of severe problem when autocommit was turned on. I can no longer remember what it caused, but it was a huge showstopper of some kind.
Thanks, Shawn
