[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Chris Douglas (JIRA) Mon, 16 May 2016 21:35:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286021#comment-15286021
 ]


Chris Douglas commented on HADOOP-12666:
----------------------------------------

Guessing that trainwreck on Jenkins was before HADOOP-13161. Will wait for an 
all-clear before resubmitting.

[~vishwajeet.dusane], could you review the proposed changes to 
{{CachedRefreshTokenBasedAccessTokenProvider}}? It does not reconfigure all 
running clients on {{setConf}}; it only shares an instance among clients 
created with the same ID. The replacement criteria were a guess based on the 
parameters used by {{ConfRefreshTokenBasedAccessTokenProvider}}, so please 
correct it as appropriate.

HADOOP-13037 will remove the dependency on WebHDFS, largely rewriting this 
client. The buffering in {{PrivateAzureDataLakeFileSystem}} should also be 
rewritten. It's implementing something like demand-paging, but some of the 
[control 
flow|https://issues.apache.org/jira/browse/HADOOP-12666?focusedCommentId=15283360&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15283360]
 would be more powerful, and more understandable, if it were layered more 
conventionally. Configuring the client is also very complex. I tried the 
directions, but only arrived at a working client with Vishwajeet's help.

The target version is 2.9, but we should hold off on backporting this before 
it's easier to use and maintain. I would like to commit the result of review 
from [~cnauroth], [~eddyxu], [~twu], [~fabbri], and [~mackrorysd] to trunk. 
It'll be easier to fixup the patch in targeted JIRAs. Committing the contract 
tests in HADOOP-12875 would also be helpful. This would be with the caveats 
from HDFS-9938: this module may be removed if it impedes WebHDFS development. 
Further, it should be easier to configure before we include it in a release. Is 
this an acceptable path forward?

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-010.patch, 
> HADOOP-12666-011.patch, HADOOP-12666-012.patch, HADOOP-12666-013.patch, 
> HADOOP-12666-014.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to