[
https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627882#comment-15627882
]
Chris Douglas commented on HADOOP-13037:
----------------------------------------
Thanks for working through the build issues, and following up on the review
feedback [~vishwajeet.dusane].
This may not require a branch. While the patch is large, much of it is removing
code (to sever the WebHDFS dependency) and adding new tests. The former can be
committed directly to trunk, and the latter we can commit in one or more
followup issues (as subtasks of HADOOP-13257?). Any objections to this approach?
A few minor comments from skimming the latest patch:
* {{AccessTokenProviderCache}} uses an unusual singleton pattern. Does it need
to create a private instance of itself, rather than initializing a static map
directly?
* {{AdlPermission#equals}} and {{#hashcode}} implementations are identical to
what they override. Instead of reading from the config to report {{getAclBit}}
(which is expensive, and unstable) it could either store the boolean or return
basic {{FsPermission}} instances when disabled (and conversely,
{{AdlPermission}} can always return true when the config knob is set). That
this is a configurable boolean is surprising, and could be explained in this
class's (empty) javadoc.
* {{TokenProviderType::fromString}} is unused
* In initialize, {{accountFQDN}} init can move up, so the
{{accessTokenProvider}} init can be contiguous
* Both the subclasses of {{AzureADTokenProvider}} throw exceptions for all
abstract methods save init, and appear not to use the private instance they
configure. The init pattern here is pretty complicated, with {{initialize}}
creating an instance and adding it to a global cache, and not using the
reference it retains. Can this be simplified? If the instance were returned,
{{AccessTokenProviderCache}} additions can be moved to {{AdlFileSystem}}, so
all the get/add calls are in one class.
* The {{TokenProviderCache}} type seems unnecessary, if {{AzureADTP}} instances
can return null to prevent caching (instead of the instanceof/cast)
* It's a matter of taste, but importing all the static members of
{{AdlConfKeys}} in {{AdlFileSystem}} (or none of them) is easier to read.
* The {{<P>}} tags are unnecessary in javadocs. Some are also oddly formatted.
* Some of the test javadocs still refer to {{BufferManager.java}} and related
classes that have been removed
> Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
> --------------------------------------------------------------------------
>
> Key: HADOOP-13037
> URL: https://issues.apache.org/jira/browse/HADOOP-13037
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, fs/azure, tools
> Reporter: Shrikant Naidu
> Assignee: Vishwajeet Dusane
> Fix For: 2.9.0
>
> Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch,
> HADOOP-13037-002.patch, HADOOP-13037-003.patch
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs
> dependencies from the ADL file system client and build out a standalone
> client. At a high level, this approach would extend the Hadoop file system
> class to provide an implementation for accessing Azure Data Lake. The scheme
> used for accessing the file system will continue to be
> adl://<accountname>.azuredatalake.net/path/to/file.
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest
> interface. The client will access the ADLS store using WebHDFS Rest APIs
> provided by the ADLS store.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]