[ 
https://issues.apache.org/jira/browse/HADOOP-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144442#comment-17144442
 ] 

Virajith Jalaparti commented on HADOOP-17072:
---------------------------------------------

Thanks for the feedback! [[email protected]] , thanks for listing the 
requirements for FileSystem changes.

I am inclined to agree with  [~umamaheswararao] 's suggestion of having this in 
a util class and not in FileSystem as sufficient APIs are already exposed to 
enable the same functionality. At this point, any application can actually do 
this implementation themselves and there's not much to do at the FS layer.

> Add getClusterRoot and getClusterRoots methods to FileSystem and 
> ViewFilesystem
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-17072
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17072
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs, viewfs
>            Reporter: Virajith Jalaparti
>            Assignee: Virajith Jalaparti
>            Priority: Major
>         Attachments: HADOOP-17072.001.patch
>
>
> In a federated setting (HDFS federation, federation across multiple buckets 
> on S3, multiple containers across Azure storage), certain system 
> tools/pipelines require the ability to map paths to the clusters/accounts.
> Consider the example of GDPR compliance/retention jobs that need to go over 
> various datasets, ingested over a period of T days and remove/quarantine 
> datasets that are not properly annotated/have reached their retention period. 
> Such jobs can rely on renames to a global trash/quarantine directory to 
> accomplish their task. However, in a federated setting, efficient, atomic 
> renames (as those within a single HDFS cluster) are not supported across the 
> different clusters/shards in federation. As a result, such jobs will need to 
> leverage a trash/quarantine directory per cluster/shard. Further, they would 
> need to map from a particular path to the cluster/shard that contains this 
> path.
> To address such cases, this JIRA proposes to get add two new methods to 
> {{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to