[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Lei (Eddy) Xu (JIRA) Mon, 08 Feb 2016 15:00:06 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137935#comment-15137935
 ]


Lei (Eddy) Xu commented on HADOOP-12666:
----------------------------------------

Hey, [~vishwajeet.dusane] 

Thanks for working on this nice patch.

Have a few questions, 

* {code:title=FileStatusCacheManager.java}
 * ACID properties are maintained in overloaded api in @see
 * PrivateAzureDataLakeFileSystem class.
{code}

* You mentioned in the above comments. But {{PrivateAzureDataLakeFileSystem}} 
does not call it within synchronized calls (e.g., 
{{PrivateAzureDataLakeFileSystem#create}}.  Although {{syncMap}} is a 
{{synchronizedMap}}, {{putFileStatus}} has multiple operations on {{syncMap}}, 
which can not guarantee atomicity.

* It might be a better idea to provide atomicity in 
{{PrivateAzureDataLakeFileSystem}}. A couple of places have multiple cache 
calls within the same function (e.g., {{rename()}}).

* It might be a good idea to rename {{FileStatusCacheManager#getFileStatus, 
putFileStatus, removeFileStatus}} to {{get/put/remove}}, because the class name 
already clearly indicates the context.

* {{FileStatusCacheObject}} can only store an absolute expiration time. And its 
methods can be package-level methods.

* I saw a few places, e.g., {{PrivateAzureDataLakeFileSystem#rename/delete}}, 
that clear the cache if the param is a directory. Could you justify the reason 
behind this? Would it cause noticeable performance degradation?  Or as an 
alternative, using LinkedList + TreeMap for FileStatusCacheManager?

* One general question, is this FileStatusCacheManager in {{HdfsClient}}? If it 
is the case, how do you make them consistent across clients on multiple nodes?

* Similar to above question, could you provide a reference architecture of how 
to run a cluster on Azure Data Lake?

* {code}
       if (b == null) {
          throw new NullPointerException();
        } else if (off < 0 || len < 0 || len > b.length - off) {
          throw new IndexOutOfBoundsException();
        } else if (len == 0) {
          return 0;
        }
{code}

Can we use {{Precondtions}} here? It will be more descriptive. 

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: HADOOP-12666-002.patch, HADOOP-12666-003.patch, 
> HADOOP-12666-004.patch, HADOOP-12666-005.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to