[
https://issues.apache.org/jira/browse/BOOKKEEPER-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694398#comment-13694398
]
Sijie Guo commented on BOOKKEEPER-636:
--------------------------------------
good catch, Vinay. but the patch make things complicated, you don't need to
introduce another file, it is easy to get the max journal file id from journal
directory, and you just need to guarantee next journal id is large than max
journal file id in journal directory when startup.
And I think we don't need to add another method, we should use
System.currentTimeMillis and add a comment about this case.
attach a patch to simplify things.
> Latest txn logs might be deleted in a race condition which is not recoverable
> if BK goes down before next txn log created.
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-636
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-636
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-server
> Affects Versions: 4.2.1, 4.3.0
> Reporter: Vinay
> Assignee: Vinay
> Priority: Blocker
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-636.diff, BOOKKEEPER-636.patch
>
>
> With the following scenario latest transaction log can be deleted.
> 1. more than {{journalMaxBackups}} txn logs are there in journal dir.
> 2. BK machine was up for long time and the latest txn log id is some what
> huge number
> 3. Now reboot the machine.
> 4. after reboot BK restarted.
> 5. Now, Immediately after startup, One entry is added, due to which Synthread
> rolled the lastMark in ledger dirs before the lastLogId updated by Journal
> thread. (this lastMark was having the old logId which was before reboot).
> 6. Now after roll, old journal txn logs were gc'ed. *Now latest created the
> txn log was deleted.*
> 7. After this Journal thread updated the lastLogMark, also some more rolls
> happened.
> 8. Now BK restarted again. But BK was not able to start because it was not
> able to find the latest txn log file in journal directory.
> {noformat}java.io.IOException: Recovery log 264564 is missing
> at org.apache.bookkeeper.bookie.Journal.replay(Journal.java:424)
> at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:547)
> at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:603)
> at
> org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:111){noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira