On 10/14/2012 5:45 PM, Erick Erickson wrote:
About your second point. Try committing more often with openSearcher
set to false.
There's a bit here:
http://wiki.apache.org/solr/SolrConfigXml
<autoCommit>
<maxDocs>10000</maxDocs> <!-- maximum uncommited docs before
autocommit triggered -->
<maxTime>15000</maxTime> <!-- maximum time (in MS) after adding
a doc before an autocommit is triggered -->
<openSearcher>false</openSearcher> <!-- SOLR 4.0. Optionally
don't open a searcher on hard commit. This is useful to minimize the
size of transaction logs that keep track of uncommitted updates. -->
</autoCommit>
That should keep the size of the transaction log down to reasonable levels...
I have autocommit turned completely off -- both values set to zero. The
DIH import from MySQL, over 12 million rows per shard, is done in one go
on all my build cores at once, then I swap cores. It takes a little
over three hours and produces a 22GB index. I have batchSize set to -1
so that jdbc streams the records.
When I first set this up back on 1.4.1, I had some kind of severe
problem when autocommit was turned on. I can no longer remember what it
caused, but it was a huge showstopper of some kind.
Thanks,
Shawn