[jira] [Comment Edited] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

Steve Loughran (JIRA) Tue, 05 Dec 2017 03:26:09 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278416#comment-16278416
 ]


Steve Loughran edited comment on HADOOP-15086 at 12/5/17 11:24 AM:
-------------------------------------------------------------------

bq. Looks like we can use conditional-headers to implement an atomic file 
rename on Azure blob storage

re-opening then, assigning to you. Tests expected, and for this kind of stuff, 
anything you can do to prove correctness.

What would be the plan?
-enum headers
-use them in the conditional rename?

What about: 
* something else adding something to the dest path during the rename process?
* something deleting the parent dir & replacing it with a file?
* something deleting files added to the dest path during the rename process?
* failure of the rename process partway through?

I think you really need to consider that you can't guarantee atomicity here, 
though you may stand a chance of improving what you get today. Before do that, 
it'd be worth getting the opinions of the Wasb team, in the form of 
[~tmarquardt].

bq.  I think it's not necessary to introduce object-store specific committers 
when a storage already supports atomic operations.

depends on whether the store consistently does rename as O(1); if its O(data) 
you will find what works in demo & test underperformance in production. Azure 
does do its renames faster than S3...we should really do some (scaleable) 
micro-benchmark integration test for both which creates a file, does the rename 
& prints its BW. I think this mostly done on both already, but not really 
quantified across stores.

The good news: the Hadoop FileOutputCommitters don't have atomic task commit 
(V1 algorithm) or job commit (V1 algorithm), instead hoping that the 
probability of a failure on task commit is low (which it is when rename is fast 
enough, and expecting the job itself to detect and react to the failure/timeout 
of task commit , and job commit (v1 can recover with all committed tasks 
retained, v2 fails the job & so delivers at-most-once). So, if you can make the 
atomicity semantics of the commit algorithms closer to that of HDFS, those 
algorithms may work better. However, I fear that you can't do enough for V1, 
not if you intend to support job restart (Spark doesn't do this, I know, but MR 
does. The v1 algorithm it lists task attempt ID path under 
{{$dest/_temporary/$appAttemptId}} and assumes that if the path is there, that 
taskId has been completed. Therefore, if 
{{$dest/_temporary/$appAttemptId/$taskAttemptId}} path is visible before the 
rename completes, rename() doesn't meet the expectations of 
{{FileOutputCommitter}}...Better to write a manifest file to 
{{$dest/_temporary/$appAttemptId}}` for every task attempt & rely on PUT being 
atomic on that write, at which point yes, you are in the world of custom 
committers


was (Author: [email protected]):
bq. Looks like we can use conditional-headers to implement an atomic file 
rename on Azure blob storage

re-opening then, assigning to you. Tests expected, and for this kind of stuff, 
anything you can do to prove correctness.

What would be the plan?
-enum headers
-use them in the conditional rename?

What about: 
* something else adding something to the dest path during the rename process?
* something deleting the parent dir & replacing it with a file?
* something deleting files added to the dest path during the rename process?
* failure of the rename process partway through?

I think you really need to consider that you can't guarantee atomicity here, 
though you may stand a chance of improving what you get today. Before do that, 
it'd be worth getting the opinions of the Wasb team, in the form of 
[~tmarquardt].

bq.  I think it's not necessary to introduce object-store specific committers 
when a storage already supports atomic operations.

depends on whether the store consistently does rename as O(1); if its O(data) 
you will find what works in demo & test underperformance in production. Azure 
does do its renames faster than S3...we should really do some (scaleable) 
micro-benchmark integration test for both which creates a file, does the rename 
& prints its BW. I think this mostly done on both already, but not really 
quantified across stores.

The good news: the Hadoop FileOutputCommitters don't have atomic task commit 
(V1 algorithm) or job commit (V1 algorithm), instead hoping that the 
probability of a failure on task commit is low (which it is when rename is fast 
enough, and expecting the job itself to detect and react to the failure/timeout 
of task commit , and job commit (v1 can recover with all committed tasks 
retained, v2 fails the job & so delivers at-most-once). So, if you can make the 
atomicity semantics of the commit algorithms closer to that of HDFS, those 
algorithms may work better. However, I fear that you can't do enough for V1, 
not if you intend to support job restart (Spark doesn't do this, I know, but MR 
does. The v1 algorithm it lists task attempt ID path under 
`$dest/_temporary/$appAttemptId/` and assumes that if the path is there, that 
taskId has been completed. Therefore, if 
`$dest/_temporary/$appAttemptId/$taskAttemptId` path is visible before the 
rename completes, rename() doesn't meet the expectations of 
`FileOutputCommitter`...Better to write a manifest file to 
`$dest/_temporary/$appAttemptId/` for every task attempt & rely on PUT being 
atomic on that write, at which point yes, you are in the world of custom 
committers

> NativeAzureFileSystem.rename is not atomic
> ------------------------------------------
>
>                 Key: HADOOP-15086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15086
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 2.7.3
>            Reporter: Shixiong Zhu
>         Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

Reply via email to