[
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278416#comment-16278416
]
Steve Loughran edited comment on HADOOP-15086 at 12/5/17 11:24 AM:
-------------------------------------------------------------------
bq. Looks like we can use conditional-headers to implement an atomic file
rename on Azure blob storage
re-opening then, assigning to you. Tests expected, and for this kind of stuff,
anything you can do to prove correctness.
What would be the plan?
-enum headers
-use them in the conditional rename?
What about:
* something else adding something to the dest path during the rename process?
* something deleting the parent dir & replacing it with a file?
* something deleting files added to the dest path during the rename process?
* failure of the rename process partway through?
I think you really need to consider that you can't guarantee atomicity here,
though you may stand a chance of improving what you get today. Before do that,
it'd be worth getting the opinions of the Wasb team, in the form of
[~tmarquardt].
bq. I think it's not necessary to introduce object-store specific committers
when a storage already supports atomic operations.
depends on whether the store consistently does rename as O(1); if its O(data)
you will find what works in demo & test underperformance in production. Azure
does do its renames faster than S3...we should really do some (scaleable)
micro-benchmark integration test for both which creates a file, does the rename
& prints its BW. I think this mostly done on both already, but not really
quantified across stores.
The good news: the Hadoop FileOutputCommitters don't have atomic task commit
(V1 algorithm) or job commit (V1 algorithm), instead hoping that the
probability of a failure on task commit is low (which it is when rename is fast
enough, and expecting the job itself to detect and react to the failure/timeout
of task commit , and job commit (v1 can recover with all committed tasks
retained, v2 fails the job & so delivers at-most-once). So, if you can make the
atomicity semantics of the commit algorithms closer to that of HDFS, those
algorithms may work better. However, I fear that you can't do enough for V1,
not if you intend to support job restart (Spark doesn't do this, I know, but MR
does. The v1 algorithm it lists task attempt ID path under
{{$dest/_temporary/$appAttemptId}} and assumes that if the path is there, that
taskId has been completed. Therefore, if
{{$dest/_temporary/$appAttemptId/$taskAttemptId}} path is visible before the
rename completes, rename() doesn't meet the expectations of
{{FileOutputCommitter}}...Better to write a manifest file to
{{$dest/_temporary/$appAttemptId}}` for every task attempt & rely on PUT being
atomic on that write, at which point yes, you are in the world of custom
committers
was (Author: [email protected]):
bq. Looks like we can use conditional-headers to implement an atomic file
rename on Azure blob storage
re-opening then, assigning to you. Tests expected, and for this kind of stuff,
anything you can do to prove correctness.
What would be the plan?
-enum headers
-use them in the conditional rename?
What about:
* something else adding something to the dest path during the rename process?
* something deleting the parent dir & replacing it with a file?
* something deleting files added to the dest path during the rename process?
* failure of the rename process partway through?
I think you really need to consider that you can't guarantee atomicity here,
though you may stand a chance of improving what you get today. Before do that,
it'd be worth getting the opinions of the Wasb team, in the form of
[~tmarquardt].
bq. I think it's not necessary to introduce object-store specific committers
when a storage already supports atomic operations.
depends on whether the store consistently does rename as O(1); if its O(data)
you will find what works in demo & test underperformance in production. Azure
does do its renames faster than S3...we should really do some (scaleable)
micro-benchmark integration test for both which creates a file, does the rename
& prints its BW. I think this mostly done on both already, but not really
quantified across stores.
The good news: the Hadoop FileOutputCommitters don't have atomic task commit
(V1 algorithm) or job commit (V1 algorithm), instead hoping that the
probability of a failure on task commit is low (which it is when rename is fast
enough, and expecting the job itself to detect and react to the failure/timeout
of task commit , and job commit (v1 can recover with all committed tasks
retained, v2 fails the job & so delivers at-most-once). So, if you can make the
atomicity semantics of the commit algorithms closer to that of HDFS, those
algorithms may work better. However, I fear that you can't do enough for V1,
not if you intend to support job restart (Spark doesn't do this, I know, but MR
does. The v1 algorithm it lists task attempt ID path under
`$dest/_temporary/$appAttemptId/` and assumes that if the path is there, that
taskId has been completed. Therefore, if
`$dest/_temporary/$appAttemptId/$taskAttemptId` path is visible before the
rename completes, rename() doesn't meet the expectations of
`FileOutputCommitter`...Better to write a manifest file to
`$dest/_temporary/$appAttemptId/` for every task attempt & rely on PUT being
atomic on that write, at which point yes, you are in the world of custom
committers
> NativeAzureFileSystem.rename is not atomic
> ------------------------------------------
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 2.7.3
> Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1
> threads can succeed. It's because check and copy file in `rename` is not
> atomic.
> I would expect it's atomic just like HDFS.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]