[ 
https://issues.apache.org/jira/browse/HADOOP-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096975#comment-15096975
 ] 

Chris Nauroth commented on HADOOP-12635:
----------------------------------------

[~dchickabasapa], I thought more about your proposal around the {{create}} 
problem.  I'd like to point out a few more things about HDFS lease semantics:
* If process A holds a lease and process B concurrently calls {{delete}}, then 
the {{delete}} is allowed and process A's lease is removed.
* Similarly, if process A holds a lease and process B concurrently calls 
{{create}} with the {{OVERWRITE}} option, then process B wins and process A's 
lease is removed.  (This is like delete + create.)
* Also similarly, if process A holds a lease and process B concurrently calls 
{{rename}} with the {{OVERWRITE}} option, then the {{rename}} is allowed and 
process A's lease is removed.  (This is like delete + rename.)

Suppose an append is in progress to a WASB file in process A and another 
process B does one of the above actions concurrently.  Is Azure Storage going 
to allow process A to continue writing and committing blocks, even though 
process A is no longer really operating on the same file that it started?

According to this document on [Managing Concurrency in Microsoft Azure 
Storage|https://azure.microsoft.com/en-us/documentation/articles/storage-concurrency/],
 the default concurrency model is last-one-wins.  The document then describes 
options for pessimistic locking (leases, which we use for a limited subset of 
operations) or optimistic locking (ETag and If-Then, which we don't use 
anywhere).  If both process A and process B are allowed to proceed writing 
blocks, then the end result is likely to be a meaningless file.  In some cases, 
that could be severe enough that end users would classify it as data loss.

I cannot think of any complete solution to this problem other than adding 
comprehensive locking across these operations.  That would be a much bigger 
change than what has been proposed in the current patch revisions.  That also 
might harm performance of WASB if applications need to make extra remote calls 
for lease acquisition and run additional lease renewer threads.  (The 
optimistic locking strategy might not suffer this problem.)

I think the strategy of trying to use metadata checks as a guard is incomplete, 
because it doesn't cover the possibility that another process, unaware of what 
happened, might continue trying to write blocks at the same blob path.  If 
there is in fact some other protection mechanism in Azure Storage that I'm 
unaware of and WASB is taking advantage of it, please let me know.  Thanks!


> Adding Append API support for WASB
> ----------------------------------
>
>                 Key: HADOOP-12635
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12635
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: azure
>    Affects Versions: 2.7.1
>            Reporter: Dushyanth
>            Assignee: Dushyanth
>         Attachments: Append API.docx, HADOOP-12635-004.patch, 
> HADOOP-12635.001.patch, HADOOP-12635.002.patch, HADOOP-12635.003.patch
>
>
> Currently the WASB implementation of the HDFS interface does not support 
> Append API. This JIRA is added to design and implement the Append API support 
> to WASB. The intended support for Append would only support a single writer.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to