[ 
https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HADOOP-18948:
--------------------------------
    Hadoop Flags: Reviewed
     Description: 
On third-party stores without lifecycle rules its possible to accrue many GB of 
pending multipart uploads, including from
* magic committer jobs where spark driver/MR AM failed before commit/abort
* distcp jobs which timeout and get aborted
* any client code writing datasets which are interrupted before close.

Although there's a purge pending uploads option, that's dangerous because if 
any fs is instantiated with it, it can destroy in-flight work

otherwise, the "hadoop s3guard uploads" command does work but needs 
scheduling/manual execution

proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} 
which will automatically cancel all pending uploads under a path
* delete: everything under the dir
* rename: all under the source dir

This will be done in parallel to the normal operation, but no attempt to post 
abortMultipartUploads in different threads. The assumption here is that this is 
rare. And it'll be off by default as in AWS people should have rules for these 
things.


+ doc (third_party?)
+ add new counter/metric for abort operations, count and duration
+ test to include cost assertions




  was:

On third-party stores without lifecycle rules its possible to accrue many GB of 
pending multipart uploads, including from
* magic committer jobs where spark driver/MR AM failed before commit/abort
* distcp jobs which timeout and get aborted
* any client code writing datasets which are interrupted before close.

Although there's a purge pending uploads option, that's dangerous because if 
any fs is instantiated with it, it can destroy in-flight work

otherwise, the "hadoop s3guard uploads" command does work but needs 
scheduling/manual execution

proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} 
which will automatically cancel all pending uploads under a path
* delete: everything under the dir
* rename: all under the source dir

This will be done in parallel to the normal operation, but no attempt to post 
abortMultipartUploads in different threads. The assumption here is that this is 
rare. And it'll be off by default as in AWS people should have rules for these 
things.


+ doc (third_party?)
+ add new counter/metric for abort operations, count and duration
+ test to include cost assertions





> S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on 
> rename/delete
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-18948
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18948
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.7-aws
>
>
> On third-party stores without lifecycle rules its possible to accrue many GB 
> of pending multipart uploads, including from
> * magic committer jobs where spark driver/MR AM failed before commit/abort
> * distcp jobs which timeout and get aborted
> * any client code writing datasets which are interrupted before close.
> Although there's a purge pending uploads option, that's dangerous because if 
> any fs is instantiated with it, it can destroy in-flight work
> otherwise, the "hadoop s3guard uploads" command does work but needs 
> scheduling/manual execution
> proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} 
> which will automatically cancel all pending uploads under a path
> * delete: everything under the dir
> * rename: all under the source dir
> This will be done in parallel to the normal operation, but no attempt to post 
> abortMultipartUploads in different threads. The assumption here is that this 
> is rare. And it'll be off by default as in AWS people should have rules for 
> these things.
> + doc (third_party?)
> + add new counter/metric for abort operations, count and duration
> + test to include cost assertions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to