[
https://issues.apache.org/jira/browse/PHOENIX-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Gwalani resolved PHOENIX-7672.
---------------------------------------
Resolution: Fixed
> Replication Log Replay should acquire lease on unclosed file before processing
> ------------------------------------------------------------------------------
>
> Key: PHOENIX-7672
> URL: https://issues.apache.org/jira/browse/PHOENIX-7672
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Himanshu Gwalani
> Assignee: Himanshu Gwalani
> Priority: Major
> Fix For: PHOENIX-7562-feature
>
>
> As of now, if a replication log file is not closed gracefully and reader
> start reading it before HDFS lease timeout on the file, it can lead to
> partial data being read by the reader. Hence the replication replay must
> ensure either file is closed or if not, acquire the lease and accordingly
> validate header and trailer in the file before processing it.
> Pseudo code (by Andrew Purtell)
> {code:java}
> isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw
> ClassCastException - bad!
> if (isClosed) {
> // This will assert that the file has a valid header and trailer.
> reader = createLogReader(filePath); // may throw IOE
> } else {
> // Recover lease via custom method using LeaseRecoverable#recoverLease.
> // Wait until we get the lease with retries.
> recoverLease(filePath); // may throw IOE
> if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE
> try {
> // This will assert that the file has a valid header and trailer.
> reader = createLogReader(filePath); // may throw IOE
> } catch (IOException e) { // how about MissingTrailerException to
> give clarity?
> // check some exception details to confirm it was a trailer issue.
> // if not a trailer issue, just rethrow the exception. otherwise,
> // should we continue even though the file is truncated? we are
> never going
> // to get that truncated data back, whatever it was. ignoring the
> whole
> // file converts potential data loss into certain data loss.
> LOG.warn("Replication log file {} is missing its trailer,
> continuing", filePath);
> reader = createLogReader(filePath, false); // may throw another
> IOE
> }
> } else {
> // Ignore the file.
> LOG.info("Ignoring zero length replication log file {}", filePath);
> }
> }
> // Clean up. Remove the replication log file at filePath. {code}
> After PHOENIX-7669, the low level reader would throw appropriate exceptions
> (MissingTrailerException, MissingHeaderException) to signal why it failed to
> initialze the reader (in cases when file is unclosed/corrupt)
> In HBase, there is already a utility class
> ([RecoverLeaseFSUtils|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L88C38])
> for this which hasĀ
> [recoverFileLease|https://github.com/apache/hbase/blob/c039e446ed73d7bda96e471be28946ebebc46144/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L98]
> method implemented (along with retries). This can be brought in to Phoenix
> Sources and updated as required.
> DOD: Acquire lease logic as mentioned in above pseudo code and handle the
> exceptions appropriately while creating Replication Log Reader.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)