[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206390#comment-15206390
]
Vishwajeet Dusane commented on HADOOP-12666:
--------------------------------------------
Participants: [~cnauroth], [~fabbri], [~hanm], [~twu], [~chris.douglas],
[~vishwajeet.dusane], [~shrikant].
*Meeting agenda*
* Discuss current comments and outstanding issues on HADOOP-12666
* Approach of sub-classing webhdfs HDFS-9938
*Packaging:*
* Given the three options that were discussed in HDFS-9938, Chris N mentioned
he was okay with the comments posted on the Jira. The option to maintain the
current approach seems to be the best way forward in the short term. This is
based on the understanding that the current approach is temporary and the
client will evolve to be independent of WebHDFS. The Jira will be updated with
this information.
* The initial approach to use WebHDFS was a good starting point, but the
reviewers feel that it is good to evolve the ADL client independent of webHDFS.
* With the current approach changes in WebHDFS will impact the ADLS client.
* The recommendation was to publish a long term plan of having a solution
independent of WebHDFS and plan to target 2.9 for the separate ADL client (Long
term plan)
* Having such a plan would make the community more comfortable accepting the
current solution.
*Create-Append Semantics:*
* Discussed overall create/append semantics, there was a concern raised that
this does not ensure single writer semantics. .
* Chris N did mention that this is a deviation from HDFS semantics of enforcing
single writer semantics, and also stated that this is an approach taken by
other cloud storage systems as well e.g WASB and s3.
* There are some applications that do require this capability, typically these
applications start writing to the same path, on recovery (e.g Hbase).
* File Systems like WASB have made specific updates to address the needs of
certain applications where handling multiple writers was an issue for e.g
HBASE. WASB has implemented a specific lease implementation for HBASE
* The ADL Client implementation also implements a similar lease semantic for
Hbase and this is specifically done for createnonrecursive
** It was clarified that the leaseid was generation by using a guid and
there was an agreement on this approach
** This information will be included in a separate document to be uploaded
to the Jira (HADOOP-12666)
* Chris N did mention that the general guideline for applications is to have
each instance write data to its separate file and then commit by renaming it.
* All accepted comments have been included in the latest patch
* Buffer Manager Race condition - has been fixed in the latest patch.
*Contract test cases for HDFS do not implement ACL related test cases, since
none of the file system extensions support them*
** Would need to create new contract tests for ACLs.
* Overall across reviewers on the call there was no further objections to the
core patches, reviewers plan to complete one more review of the updated patches.
* HADOOP-12875 has been updated with a patch which includes an ability to run
lives tests using contract tests, new test cases have been added:
*Followup Action items*
* Share/upload document that covers
** information on read semantics, read ahead buffering, Create/Append semantics
** Lease id generation to be included in the document
* Share an overall plan on the roadmap for the ADL client - essentially what is
the plan for removing the dependency on webhdfs (a "+1" on the Jira will be
contingent on publishing this plan).Next step is for reviewers to complete the
review of the new patch (Aaron to help with Cloudera reviewers)
* Produce a page for alternative file systems
** Documents the differences to HDFS ; Example: HADOOP-12635
* Attach Detailed documentation on file status cache (HADOOP-12876)
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf,
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch,
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch,
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)