[ 
https://issues.apache.org/jira/browse/HADOOP-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HADOOP-17072:
----------------------------------------
    Description: 
In a federated setting (HDFS federation, federation across multiple buckets on 
S3, multiple containers across Azure storage), certain system tools/pipelines 
require the ability to map paths to the clusters/accounts.

Consider GDPR compliance/retention jobs need to go over the datasets ingested 
over a period of T days and remove/quarantine datasets that are not properly 
annotated/have reached their retention period. Such jobs can rely on renames to 
a global trash/quarantine directory to accomplish their task. However, in a 
federated setting, efficient, atomic renames (as those within a single HDFS 
cluster) are not supported across the different clusters/shards in federation. 
As a result, such jobs will need to get the clusters to which different paths 
map to.

To address such cases, this JIRA proposes to get add two new methods to 
{{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.

  was:
In a federated setting (HDFS federation, federation across multiple buckets on 
S3, multiple containers across Azure storage), certain system tools/pipelines 
require the ability to map paths to the clusters/accounts.

Consider GDPR compliance/retention jobs need to go over the datasets ingested 
over a period of T days and remove/quarantine datasets that are not properly 
annotated/have reached their retention period. Such jobs can rely on renames to 
a global trash/quarantine directory to accomplish their task. However, in a 
federated setting, efficient, atomic renames (as those within a single HDFS 
cluster) are not supported across the different clusters/shards in federation. 
As a result, such jobs will need to get the clusters to which different paths 
map to.

To address such cases, this JIRA proposed to get add two new methods to 
{{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.




> Add getClusterRoot and getClusterRoots methods to FileSystem and 
> ViewFilesystem
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-17072
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17072
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs, viewfs
>            Reporter: Virajith Jalaparti
>            Priority: Major
>
> In a federated setting (HDFS federation, federation across multiple buckets 
> on S3, multiple containers across Azure storage), certain system 
> tools/pipelines require the ability to map paths to the clusters/accounts.
> Consider GDPR compliance/retention jobs need to go over the datasets ingested 
> over a period of T days and remove/quarantine datasets that are not properly 
> annotated/have reached their retention period. Such jobs can rely on renames 
> to a global trash/quarantine directory to accomplish their task. However, in 
> a federated setting, efficient, atomic renames (as those within a single HDFS 
> cluster) are not supported across the different clusters/shards in 
> federation. As a result, such jobs will need to get the clusters to which 
> different paths map to.
> To address such cases, this JIRA proposes to get add two new methods to 
> {{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to