Shawn:

If you do hard commits, no matter what the openSearcher value, and the
machine crashes when it comes back up you'll see those commits.

How I'd approach it if I absolutely _had_ to do a complete rollback would
be something like force a replication to a dedicated machine before the
import, then I'd have a backup I could restore if things crashed.

But most likely I'd say "we shouldn't worry about this because if our
hardware is that flaky we have bigger problems".....

Best
Erick


On Mon, Mar 25, 2013 at 5:34 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 3/24/2013 10:02 AM, Niran Fajemisin wrote:
>
>> We import about 1.5 million documents on a nightly basis using DIH.
>> During this time, we need to ensure that all documents make it into index
>> otherwise rollback on any errors; which DIH takes care of for us. We also
>> disable autoCommit in DIH but instruct it to commit at the very end of the
>> import. This is all done through configuration of the DIH config XML file
>> and the command issued to the request handler.
>>
>> We have noticed that the tlog file appears to linger around even after
>> DIH has issued the hard commit. My expectation would be that after the hard
>> commit has occurred, the tlog file will be removed. I'm obviously
>> misunderstanding how this all works.
>>
>
> You've already gotten the reason for the giant tlog hanging around.
>
> The way to actually fix this problem is to turn on autoCommit with one of
> the values set relatively low.  The key to enabling autoCommit without
> changing anything about how your import process works is this: make sure
> that openSearcher is set to false in the autoCommit:
>
> <updateHandler class="solr.**DirectUpdateHandler2">
>   <autoCommit>
>     <maxDocs>25000</maxDocs>
>     <maxTime>300000</maxTime>
>     <openSearcher>false</**openSearcher>
>   </autoCommit>
>   <updateLog />
> </updateHandler>
>
> I make maxDocs low rather than maxTime, but that's up to you.  Each hard
> commit done by autoCommit will create a new tlog, and each tlog will be
> fairly small.  Only a few of them will be kept around, so the disk space
> requirement will be small, and restarting Solr will be fast because there
> won't be a lot of data to replay.
>
> With openSearcher set to false, there will be NO changes in document
> visibility.  Searches will continue using the old searcher, so the old
> documents will still be there and the new documents will NOT be searchable
> until DIH does its explicit commit at the end.
>
> The one thing that I'm not sure about is what happens if Solr or the
> machine crashes in the middle of the import.  Complete rollback might not
> be possible.  Someone with better knowledge may have to comment there.
>
> Thanks,
> Shawn
>
>

Reply via email to