[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137935#comment-15137935
]
Lei (Eddy) Xu commented on HADOOP-12666:
----------------------------------------
Hey, [~vishwajeet.dusane]
Thanks for working on this nice patch.
Have a few questions,
* {code:title=FileStatusCacheManager.java}
* ACID properties are maintained in overloaded api in @see
* PrivateAzureDataLakeFileSystem class.
{code}
* You mentioned in the above comments. But {{PrivateAzureDataLakeFileSystem}}
does not call it within synchronized calls (e.g.,
{{PrivateAzureDataLakeFileSystem#create}}. Although {{syncMap}} is a
{{synchronizedMap}}, {{putFileStatus}} has multiple operations on {{syncMap}},
which can not guarantee atomicity.
* It might be a better idea to provide atomicity in
{{PrivateAzureDataLakeFileSystem}}. A couple of places have multiple cache
calls within the same function (e.g., {{rename()}}).
* It might be a good idea to rename {{FileStatusCacheManager#getFileStatus,
putFileStatus, removeFileStatus}} to {{get/put/remove}}, because the class name
already clearly indicates the context.
* {{FileStatusCacheObject}} can only store an absolute expiration time. And its
methods can be package-level methods.
* I saw a few places, e.g., {{PrivateAzureDataLakeFileSystem#rename/delete}},
that clear the cache if the param is a directory. Could you justify the reason
behind this? Would it cause noticeable performance degradation? Or as an
alternative, using LinkedList + TreeMap for FileStatusCacheManager?
* One general question, is this FileStatusCacheManager in {{HdfsClient}}? If it
is the case, how do you make them consistent across clients on multiple nodes?
* Similar to above question, could you provide a reference architecture of how
to run a cluster on Azure Data Lake?
* {code}
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
{code}
Can we use {{Precondtions}} here? It will be more descriptive.
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch,
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)