[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Vishwajeet Dusane (JIRA) Tue, 22 Mar 2016 06:53:05 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206390#comment-15206390
 ]


Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------


Participants: [~cnauroth], [~fabbri], [~hanm], [~twu], [~chris.douglas], 
[~vishwajeet.dusane], [~shrikant].
 
*Meeting agenda* 
*   Discuss current comments and outstanding issues on HADOOP-12666
*   Approach of sub-classing webhdfs HDFS-9938
 
*Packaging:* 
* Given the three options that were discussed in HDFS-9938, Chris N mentioned 
he was okay with the comments posted on the Jira. The option to maintain the 
current approach seems to be the best way forward in the short term. This is 
based on the understanding that the current approach is temporary and the 
client will evolve to be independent of WebHDFS.  The Jira will be updated with 
this information.
* The initial approach to use WebHDFS was a good starting point,  but the 
reviewers feel that it is good to evolve the ADL client independent of webHDFS.
* With the current approach changes in WebHDFS will impact the ADLS client.
* The recommendation was to publish a long term plan of having a solution 
independent of WebHDFS and plan to target 2.9 for the separate ADL client (Long 
term plan)
* Having such a plan would make the community more comfortable accepting the 
current solution.
 
*Create-Append Semantics:*
* Discussed overall create/append semantics, there was a concern raised that 
this does not ensure single writer semantics. .
* Chris N did mention that this is a deviation from HDFS semantics of enforcing 
single writer semantics, and also stated that this is an approach taken by 
other cloud storage systems as well e.g WASB and s3.
* There are some applications that do require this capability, typically these 
applications start writing to the same path, on recovery (e.g Hbase). 
* File Systems like WASB have made  specific updates to address  the needs of 
certain applications where handling multiple writers was an issue for e.g 
HBASE.  WASB has implemented a specific lease implementation for HBASE
* The ADL Client implementation also implements a similar lease semantic for 
Hbase and this is specifically done for createnonrecursive 
 **     It was clarified that the leaseid was generation by using a guid and 
there was an agreement on this approach
 **     This information will be included in a separate document to be uploaded 
to the Jira (HADOOP-12666)
* Chris N did mention that the general guideline for applications is to have 
each instance write data to its separate file and then commit by renaming it.
 
* All accepted comments have been included in the latest patch 
* Buffer Manager Race condition - has been fixed in the latest patch.
 
*Contract test cases for HDFS do not implement ACL related test cases, since 
none of the file system extensions support  them*
** Would need to create new contract tests for ACLs.
 
* Overall across reviewers on the call there was no further objections to the 
core patches, reviewers plan to complete one more review of the updated patches.
* HADOOP-12875 has been updated with a patch which includes an ability to run 
lives tests using contract tests, new test cases have been added:
 
*Followup Action items*
* Share/upload document that covers 
** information on read semantics, read ahead buffering, Create/Append semantics 
** Lease id generation to be included in the document 
* Share an overall plan on the roadmap for the ADL client - essentially what is 
the plan for removing the dependency on webhdfs (a  "+1" on the Jira will be 
contingent on publishing this plan).Next step is for reviewers to complete the 
review of the new patch (Aaron to help with Cloudera reviewers)
* Produce a page for alternative file systems
** Documents the differences to HDFS ; Example: HADOOP-12635
*  Attach Detailed documentation on file status cache (HADOOP-12876)

> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
>                 Key: HADOOP-12666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12666
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, tools
>            Reporter: Vishwajeet Dusane
>            Assignee: Vishwajeet Dusane
>         Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>          Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications such has MR, HIVE, Hbase etc..,  to use ADL store as 
> input or output.
>  
> ADL is ultra-high capacity, Optimized for massive throughput with rich 
> management and security features. More details available at 
> https://azure.microsoft.com/en-us/services/data-lake-store/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

Reply via email to