[ 
https://issues.apache.org/jira/browse/HADOOP-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141534#comment-17141534
 ] 

Steve Loughran commented on HADOOP-17072:
-----------------------------------------

 Without looking at the patch itself except briefly, here's what we need for 
anything which proposes changes to FileSystem
 
* works well with object stores
* works well with HDFS
* uses hasPathCapability() with a new capability to let callers dynamically 
determine if an FS implements a feature before invoking the method
* every new operation MUST be added to FileContext as well as FileSystem
* adds a new (strict) specification in the fileystem spec docs, where you 
really do get to define what it is meant to do in a way we can derive both 
implementations and tests without going "let's just look at what viewfs does 
and just copy it"
* comes with the FS contract tests derived from the specification.
* doesn't cause any regressions.'
* doesn't accidentally bypass the filter filesystems & checksum 
creation/validation etc.
* tagged as unstable or evolving

+ some other guidance in the javadocs at the top of FileSystem.


Yes that's a lot of work. But the file system APIs are the things we have 
already maintained over a decade, are broadly used and implemented in many more 
places than just HDFS. anything that goes into those classes Will need to be 
maintained for a long time. We need to be rigourous here. This also means the 
review is going have to be fairly strict too. Sorry.

> Add getClusterRoot and getClusterRoots methods to FileSystem and 
> ViewFilesystem
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-17072
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17072
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs, viewfs
>            Reporter: Virajith Jalaparti
>            Assignee: Virajith Jalaparti
>            Priority: Major
>         Attachments: HADOOP-17072.001.patch
>
>
> In a federated setting (HDFS federation, federation across multiple buckets 
> on S3, multiple containers across Azure storage), certain system 
> tools/pipelines require the ability to map paths to the clusters/accounts.
> Consider the example of GDPR compliance/retention jobs that need to go over 
> various datasets, ingested over a period of T days and remove/quarantine 
> datasets that are not properly annotated/have reached their retention period. 
> Such jobs can rely on renames to a global trash/quarantine directory to 
> accomplish their task. However, in a federated setting, efficient, atomic 
> renames (as those within a single HDFS cluster) are not supported across the 
> different clusters/shards in federation. As a result, such jobs will need to 
> leverage a trash/quarantine directory per cluster/shard. Further, they would 
> need to map from a particular path to the cluster/shard that contains this 
> path.
> To address such cases, this JIRA proposes to get add two new methods to 
> {{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to