[ 
https://issues.apache.org/jira/browse/HADOOP-12508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974543#comment-14974543
 ] 

Gaurav Kanade commented on HADOOP-12508:
----------------------------------------

Hey [~cnauroth]

Thanks for the review! The  acquireLease makes a call to obtain a 
SelfRenewingLease. If you look at the SelfRenewingLease class it will keep 
waiting to acquire lease until it finally acquires it. The concurrent process 
that holds the lease is dead - so once its existing hold on the lease expires 
it will not self renew (as the self-renewing thread will be dead as well). This 
patch seeks to address this particular condition. Thus the expected behavior 
would be the process that died while still holding a dangling lease will not 
attempt to renew. The process will keep on trying to acquire the lease until it 
gets it which will be once the existing lease expires (The default lease 
holding time is 60 sec, so it will be a worst case test for 60 sec). At a bare 
minimum this patch will not break anything that is already broken, and it will 
expose a deeper issue if exists.

As for the testing, we are working on designing a framework that can test error 
conditions caused by concurrent processes exiting unexpectedly (this seems to 
be the class of issues we are hitting and are exposed by the new HBase test 
introduced in HDP 2.3 - these seem to be rarely occurring in practice as no 
customer seems to have hit them yet). In the meanwhile if you have ideas 
regarding the kind of testing that can be done quickly in the short term would 
love to hear those.

> delete fails with exception when lease is held on blob
> ------------------------------------------------------
>
>                 Key: HADOOP-12508
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12508
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Gaurav Kanade
>            Assignee: Gaurav Kanade
>            Priority: Blocker
>         Attachments: HADOOP-12508.01.patch, HADOOP-12508.02.patch
>
>
> The delete function as implemented by AzureNativeFileSystem store attempts 
> delete without a lease. In most cases this works but in the case of a 
> dangling lease resulting out of say a process killed and leaving a lease 
> dangling for a small period a delete attempted during this period simply 
> crashes. This fix addresses the situation by re-attempting the delete after a 
> lease acqusition in this case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to