On 3/24/2013 10:02 AM, Niran Fajemisin wrote:
We import about 1.5 million documents on a nightly basis using DIH. During this 
time, we need to ensure that all documents make it into index otherwise 
rollback on any errors; which DIH takes care of for us. We also disable 
autoCommit in DIH but instruct it to commit at the very end of the import. This 
is all done through configuration of the DIH config XML file and the command 
issued to the request handler.
We have noticed that the tlog file appears to linger around even after DIH has 
issued the hard commit. My expectation would be that after the hard commit has 
occurred, the tlog file will be removed. I'm obviously misunderstanding how 
this all works.
You've already gotten the reason for the giant tlog hanging around.

The way to actually fix this problem is to turn on autoCommit with one of the values set relatively low. The key to enabling autoCommit without changing anything about how your import process works is this: make sure that openSearcher is set to false in the autoCommit:
<updateHandler class="solr.DirectUpdateHandler2">
  <autoCommit>
    <maxDocs>25000</maxDocs>
    <maxTime>300000</maxTime>
    <openSearcher>false</openSearcher>
  </autoCommit>
  <updateLog />
</updateHandler>

I make maxDocs low rather than maxTime, but that's up to you. Each hard commit done by autoCommit will create a new tlog, and each tlog will be fairly small. Only a few of them will be kept around, so the disk space requirement will be small, and restarting Solr will be fast because there won't be a lot of data to replay.
With openSearcher set to false, there will be NO changes in document 
visibility.  Searches will continue using the old searcher, so the old 
documents will still be there and the new documents will NOT be 
searchable until DIH does its explicit commit at the end.
The one thing that I'm not sure about is what happens if Solr or the 
machine crashes in the middle of the import.  Complete rollback might 
not be possible.  Someone with better knowledge may have to comment there.
Thanks,
Shawn

Reply via email to