[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138915#comment-15138915
]
Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------
Thank you [~cnauroth] for the comments.
1. Yes, i will upload the respective document.
2. We do have extended Contact test cases which integrated with back-end
service however those test are not pushed as part of this check in. I will
create separate JIRA for the Live mode test cases.
3. We do run contract test cases.
For the code.
1. I have refactored namespace as per comments from [~fabbri] and [~cnauroth]
suggestion as
||Namespace||Purpose||
|org.apache.hadoop.fs.adl|Public interface exposed for Hadoop application to
integrate with. For long term support, this namespace to stay even if we remove
refactor dependency on org.apache.hadoop.hdfs.web|
|org.apache.hadoop.hdfs.web|Extension of WebHdfsFileSystem to override
protected functionality. Example ConnectionFactory access, Override redirection
operation etc.|
2. {panel}PrivateAzureDataLakeFileSystem{panel} is exposed for
{panel}AdlFileSystem{panel} to inherit. I will add the documentation for the
same.
3. Intentional to not add lock. Even if multiple instances are created the last
instance would be used across to refresh token.
4. Yes, Similar comment i got from [~chris.douglas] as well. Reason behind
hiding logging through was to switch quickly between {panel}Log{panel} and
{panel}System.out.println{panel} during debugging. Quickest way is change the
code than configuration file. We will migrate to use SLF4J but not part of this
patch release. is that fine?
5. Explained above
6. Agree and incorporated the code change.
7. {panel}FileStatus{panel} cache management feature is configurable. In case
of some scenarios are breaking for the customer, they can turn off the local
cache. Cache scope is within the process. I will document on the behavior and
tuning flags like duration of the cache. We do see great performance
improvement however we do not wish to compromise on the correctness.
8. {panel}ADLConfKeys#LOG_VERSION{panel} is to capture code instrumentation
version. this information is used only during debugging session.
9. Excellent point. Bug was, mock server hung up
{panel}TestAAAAAUploadBenchmark/TestAAAAADownloadBenchmark{panel} are not
executed before other tests. I will investigate the root cause of this issue
since you pointed out on the execution order not guaranteed.
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch,
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)