On 3/24/2013 10:02 AM, Niran Fajemisin wrote:
We import about 1.5 million documents on a nightly basis using DIH. During this
time, we need to ensure that all documents make it into index otherwise
rollback on any errors; which DIH takes care of for us. We also disable
autoCommit in DIH but instruct it to commit at the very end of the import. This
is all done through configuration of the DIH config XML file and the command
issued to the request handler.
We have noticed that the tlog file appears to linger around even after DIH has
issued the hard commit. My expectation would be that after the hard commit has
occurred, the tlog file will be removed. I'm obviously misunderstanding how
this all works.
You've already gotten the reason for the giant tlog hanging around.
The way to actually fix this problem is to turn on autoCommit with one
of the values set relatively low. The key to enabling autoCommit
without changing anything about how your import process works is this:
make sure that openSearcher is set to false in the autoCommit:
<updateHandler class="solr.DirectUpdateHandler2">
<autoCommit>
<maxDocs>25000</maxDocs>
<maxTime>300000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<updateLog />
</updateHandler>
I make maxDocs low rather than maxTime, but that's up to you. Each hard
commit done by autoCommit will create a new tlog, and each tlog will be
fairly small. Only a few of them will be kept around, so the disk space
requirement will be small, and restarting Solr will be fast because
there won't be a lot of data to replay.
With openSearcher set to false, there will be NO changes in document
visibility. Searches will continue using the old searcher, so the old
documents will still be there and the new documents will NOT be
searchable until DIH does its explicit commit at the end.
The one thing that I'm not sure about is what happens if Solr or the
machine crashes in the middle of the import. Complete rollback might
not be possible. Someone with better knowledge may have to comment there.
Thanks,
Shawn