Re: [VOTE] Accept Crail into the Apache Incubator

Luciano Resende Thu, 26 Oct 2017 09:02:09 -0700

Off course, my + 1

On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <luckbr1...@gmail.com>
wrote:


> Now that the discussion thread on the Crail proposal has ended, please
> vote on accepting Crail into into the Apache Incubator.
>
> The ASF voting rules are described at:
>    http://www.apache.org/foundation/voting.html
>
> A vote for accepting a new Apache Incubator podling is a majority vote
> for which only Incubator PMC member votes are binding.
>
> Votes from other people are also welcome as an indication of peoples
> enthusiasm (or lack thereof).
>
> Please do not use this VOTE thread for discussions.
> If needed, start a new thread instead.
>
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept Crail into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept Crail into the Apache Incubator because ...
>
> The proposal below is also on the wiki:
> https://wiki.apache.org/incubator/CrailProposal
>
> ===
>
> Abstract
>
> Crail is a storage platform for sharing performance critical data in
> distributed data processing jobs at very high speed. Crail is built
> entirely upon principles of user-level I/O and specifically targets data
> center deployments with fast network and storage hardware (e.g., 100Gbps
> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
> such resource disaggregation or serverless computing. Crail is written in
> Java and integrates seamlessly with the Apache data processing ecosystem.
> It can be used as a backbone to accelerate high-level data operations such
> as shuffle or broadcast, or as a cache to store hot data that is queried
> repeatedly, or as a storage platform for sharing inter-job data in complex
> multi-job pipelines, etc.
>
> Proposal
>
> Crail enables Apache data processing frameworks to run efficiently in next
> generation data centers using fast storage and network hardware in
> combination with resource (e.g., DRAM, Flash) disaggregation.
>
> Background
>
> Crail started as a research project at the IBM Zurich Research Laboratory
> around 2014 aiming to integrate high-speed I/O hardware effectively into
> large scale data processing systems.
>
> Rational
>
> During the last decade, I/O hardware has undergone rapid performance
> improvements, typically in the order of magnitudes. Modern day networking
> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
> microseconds of access latencies. However, despite such progress in raw I/O
> performance, effectively leveraging modern hardware in data processing
> frameworks remains challenging. In most of the cases, upgrading to high-end
> networking or storage hardware has very little effect on the performance of
> analytics workloads. The problem comes from heavily layered software
> imposing overheads such as deep call stacks, unnecessary data copies,
> thread contention, etc. These problems have already been addressed at the
> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> allowing applications to bypass software layers during I/O operations.
> Distributed data processing frameworks on the other hand, are typically
> implemented on legacy I/O interfaces such as such as sockets or block
> storage. These interfaces have been shown to be insufficient to deliver the
> full hardware performance. Yet, to the best of our knowledge, there are no
> active and systematic efforts to integrate these new user level I/O APIs
> into Apache software frameworks. This problem affects all end-users and
> organizations that use Apache software. We expect them to see
> unsatisfactory small performance gains when upgrading their networking and
> storage hardware.
>
> Crail solves this problem by providing an efficient storage platform built
> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
> operations. Moreover, Crail directly leverages the specific hardware
> features of RDMA and NVMe to provide a better integration with high-level
> data operations in Apache compute frameworks. As a consequence, Crail
> enables users to run larger, more complex queries against ever increasing
> amounts of data at a speed largely determined by the deployed hardware.
> Crail is generic solution that integrates well with the Apache ecosystem
> including frameworks like Spark, Hadoop, Hive, etc.
>
> Initial Goals
>
> The initial goals to move Crail to the Apache Incubator is to broaden the
> community, and foster contributions from developers to leverage Crail in
> various data processing frameworks and workloads. Ultimately, the goal for
> Crail is to become the de-facto standard platform for storing temporary
> performance critical data in distributed data processing systems.
>
> Current Status
>
> The initial code has been developed at the IBM Zurich Research Center and
> has recently been made available in GitHub under the Apache Software
> License 2.0. The Project currently has explicit support for Spark and
> Hadoop. Project documentation is available on the website www.crail.io.
> There is also a public forum for discussions related to Crail available at
> https://groups.google.com/forum/#!forum/zrlio-users.
>
> Mericrotacy
>
> The current developers are familiar with the meritocratic open source
> development process at Apache. Over the last year, the project has gathered
> interest at GitHub and several companies have already expressed interest in
> the project. We plan to invest in supporting a meritocracy by inviting
> additional developers to participate.
>
> Community
>
> The need for a generic solution to integrate high-performance I/O hardware
> in the open source is tremendous, so there is a potential for a very large
> community. We believe that Crail’s extensible architecture and its
> alignment with the Apache Ecosystem will further encourage community
> participation. We expect that over time Crail will attract a large
> community.
>
> Alignment
>
> Crail is written in Java and is built for the Apache data processing
> ecosystem. The basic storage services of Crail can be used seamlessly from
> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
> processing specific binding, which currently are available only for Spark.
> We think that moving Crail to the Apache incubator will help to extend
> Crail’s support for different data processing frameworks.
>
> Known Risks
>
> To-date, development has been sponsored by IBM and coordinated mostly by
> the core team of researchers at the IBM Zurich Research Center. For Crail
> to fully transition to an "Apache Way" governance model, it needs to start
> embracing the meritocracy-centric way of growing the community of
> contributors.
>
> Orphaned Products
>
> The Crail developers have a long-term interest in use and maintenance of
> the code and there is also hope that growing a diverse community around the
> project will become a guarantee against the project becoming orphaned. We
> feel that it is also important to put formal governance in place both for
> the project and the contributors as the project expands. We feel ASF is the
> best location for this.
>
> Inexperience with Open Source
>
> Several of the initial committers are experienced open source developers
> (Linux Kernel, DPDK, etc.).
>
> Relationships with Other Apache Products
>
> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> designed to integrate with any of the Apache data processing frameworks.
>
> Homogeneous Developers
>
> The project already has a diverse developer base including contributions
> from organizations and public developers.
>
> An Excessive Fascination with the Apache Brand
>
> Crail solves a real need for a generic approach to leverage modern network
> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
> Our rationale for developing Crail as an Apache project is detailed in the
> Rationale section. We believe that the Apache brand and community process
> will help to us to engage a larger community and facilitate closer ties
> with various Apache data processing projects.
>
> Documentation
>
> Documentation regarding Crail is available at www.crail.io
>
> Initial Source
>
> Initial source is available on GitHub under the Apache License 2.0:
>
> https://github.com/zrlio/crail
> External Dependencies
>
> Crail is written in Java and currently supports Apache Hadoop MapReduce
> and Apache Spark runtimes. To the best of our knowledge, all dependencies
> of Crail are distributed under Apache compatible licenses.
>
> Required Resource
>
> Mailing lists
>
> priv...@crail.incubator.apache.org
> d...@crail.incubator.apache.org
> comm...@crail.incubator.apache.org
> Git repository
>
> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> Issue Tracking
>
> JIRA (Crail)
> Initial Committers
>
> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> Ana Klimovic <anakli AT stanford DOT edu>
> Yuval Degani <yuvaldeg AT mellanox DOT com>
> Vu Pham <vuhuong AT mellanox DOT com>
> Affiliations
>
> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> Michael Kaufmann, Adrian Schuepbach)
> University of New Hampshire (Patrick McArthur)
> Stanford University (Ana Klimovic)
> Mellanox (Yuval Degani, Vu Pham)
> Sponsors
>
> Champion
>
> Luciano Resende <lresende AT apache DOT org>
>
> Nominated Mentors
>
> Luciano Resende <lresende AT apache DOT org>
>
> Raphael Bircher <rbircher AT apache DOT org>
>
> Julian Hyde <jhyde AT apache DOT org>
>
> Sponsoring Entity
>
> We would like to propose the Apache Incubator to sponsor this project.
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [VOTE] Accept Crail into the Apache Incubator

Reply via email to