[
https://issues.apache.org/jira/browse/HADOOP-12508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974543#comment-14974543
]
Gaurav Kanade commented on HADOOP-12508:
----------------------------------------
Hey [~cnauroth]
Thanks for the review! The acquireLease makes a call to obtain a
SelfRenewingLease. If you look at the SelfRenewingLease class it will keep
waiting to acquire lease until it finally acquires it. The concurrent process
that holds the lease is dead - so once its existing hold on the lease expires
it will not self renew (as the self-renewing thread will be dead as well). This
patch seeks to address this particular condition. Thus the expected behavior
would be the process that died while still holding a dangling lease will not
attempt to renew. The process will keep on trying to acquire the lease until it
gets it which will be once the existing lease expires (The default lease
holding time is 60 sec, so it will be a worst case test for 60 sec). At a bare
minimum this patch will not break anything that is already broken, and it will
expose a deeper issue if exists.
As for the testing, we are working on designing a framework that can test error
conditions caused by concurrent processes exiting unexpectedly (this seems to
be the class of issues we are hitting and are exposed by the new HBase test
introduced in HDP 2.3 - these seem to be rarely occurring in practice as no
customer seems to have hit them yet). In the meanwhile if you have ideas
regarding the kind of testing that can be done quickly in the short term would
love to hear those.
> delete fails with exception when lease is held on blob
> ------------------------------------------------------
>
> Key: HADOOP-12508
> URL: https://issues.apache.org/jira/browse/HADOOP-12508
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Gaurav Kanade
> Assignee: Gaurav Kanade
> Priority: Blocker
> Attachments: HADOOP-12508.01.patch, HADOOP-12508.02.patch
>
>
> The delete function as implemented by AzureNativeFileSystem store attempts
> delete without a lease. In most cases this works but in the case of a
> dangling lease resulting out of say a process killed and leaving a lease
> dangling for a small period a delete attempted during this period simply
> crashes. This fix addresses the situation by re-attempting the delete after a
> lease acqusition in this case
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)