Re: [VOTE] Accept Crail into the Apache Incubator

Debo Dutta (dedutta) Thu, 26 Oct 2017 09:41:19 -0700

+1

On 10/26/17, 9:30 AM, "Gang(Gary) Wang" <ga...@apache.org> wrote:


    +1
    
    
    On Thu, Oct 26, 2017 at 9:25 AM, Clebert Suconic <clebert.suco...@gmail.com>
    wrote:
    
    > +1
    >
    > On Thu, Oct 26, 2017 at 12:01 PM, Luciano Resende <luckbr1...@gmail.com>
    > wrote:
    > > Off course, my + 1
    > >
    > > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <luckbr1...@gmail.com>
    > > wrote:
    > >
    > >> Now that the discussion thread on the Crail proposal has ended, please
    > >> vote on accepting Crail into into the Apache Incubator.
    > >>
    > >> The ASF voting rules are described at:
    > >>    http://www.apache.org/foundation/voting.html
    > >>
    > >> A vote for accepting a new Apache Incubator podling is a majority vote
    > >> for which only Incubator PMC member votes are binding.
    > >>
    > >> Votes from other people are also welcome as an indication of peoples
    > >> enthusiasm (or lack thereof).
    > >>
    > >> Please do not use this VOTE thread for discussions.
    > >> If needed, start a new thread instead.
    > >>
    > >> This vote will run for at least 72 hours. Please VOTE as follows
    > >> [] +1 Accept Crail into the Apache Incubator
    > >> [] +0 Abstain.
    > >> [] -1 Do not accept Crail into the Apache Incubator because ...
    > >>
    > >> The proposal below is also on the wiki:
    > >> https://wiki.apache.org/incubator/CrailProposal
    > >>
    > >> ===
    > >>
    > >> Abstract
    > >>
    > >> Crail is a storage platform for sharing performance critical data in
    > >> distributed data processing jobs at very high speed. Crail is built
    > >> entirely upon principles of user-level I/O and specifically targets 
data
    > >> center deployments with fast network and storage hardware (e.g., 
100Gbps
    > >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
    > operation
    > >> such resource disaggregation or serverless computing. Crail is written
    > in
    > >> Java and integrates seamlessly with the Apache data processing
    > ecosystem.
    > >> It can be used as a backbone to accelerate high-level data operations
    > such
    > >> as shuffle or broadcast, or as a cache to store hot data that is 
queried
    > >> repeatedly, or as a storage platform for sharing inter-job data in
    > complex
    > >> multi-job pipelines, etc.
    > >>
    > >> Proposal
    > >>
    > >> Crail enables Apache data processing frameworks to run efficiently in
    > next
    > >> generation data centers using fast storage and network hardware in
    > >> combination with resource (e.g., DRAM, Flash) disaggregation.
    > >>
    > >> Background
    > >>
    > >> Crail started as a research project at the IBM Zurich Research
    > Laboratory
    > >> around 2014 aiming to integrate high-speed I/O hardware effectively 
into
    > >> large scale data processing systems.
    > >>
    > >> Rational
    > >>
    > >> During the last decade, I/O hardware has undergone rapid performance
    > >> improvements, typically in the order of magnitudes. Modern day
    > networking
    > >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
    > few
    > >> microseconds of access latencies. However, despite such progress in raw
    > I/O
    > >> performance, effectively leveraging modern hardware in data processing
    > >> frameworks remains challenging. In most of the cases, upgrading to
    > high-end
    > >> networking or storage hardware has very little effect on the
    > performance of
    > >> analytics workloads. The problem comes from heavily layered software
    > >> imposing overheads such as deep call stacks, unnecessary data copies,
    > >> thread contention, etc. These problems have already been addressed at
    > the
    > >> operating system level with new I/O APIs such as RDMA verbs, NVMe, 
etc.,
    > >> allowing applications to bypass software layers during I/O operations.
    > >> Distributed data processing frameworks on the other hand, are typically
    > >> implemented on legacy I/O interfaces such as such as sockets or block
    > >> storage. These interfaces have been shown to be insufficient to deliver
    > the
    > >> full hardware performance. Yet, to the best of our knowledge, there are
    > no
    > >> active and systematic efforts to integrate these new user level I/O 
APIs
    > >> into Apache software frameworks. This problem affects all end-users and
    > >> organizations that use Apache software. We expect them to see
    > >> unsatisfactory small performance gains when upgrading their networking
    > and
    > >> storage hardware.
    > >>
    > >> Crail solves this problem by providing an efficient storage platform
    > built
    > >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
    > I/O
    > >> operations. Moreover, Crail directly leverages the specific hardware
    > >> features of RDMA and NVMe to provide a better integration with
    > high-level
    > >> data operations in Apache compute frameworks. As a consequence, Crail
    > >> enables users to run larger, more complex queries against ever
    > increasing
    > >> amounts of data at a speed largely determined by the deployed hardware.
    > >> Crail is generic solution that integrates well with the Apache 
ecosystem
    > >> including frameworks like Spark, Hadoop, Hive, etc.
    > >>
    > >> Initial Goals
    > >>
    > >> The initial goals to move Crail to the Apache Incubator is to broaden
    > the
    > >> community, and foster contributions from developers to leverage Crail 
in
    > >> various data processing frameworks and workloads. Ultimately, the goal
    > for
    > >> Crail is to become the de-facto standard platform for storing temporary
    > >> performance critical data in distributed data processing systems.
    > >>
    > >> Current Status
    > >>
    > >> The initial code has been developed at the IBM Zurich Research Center
    > and
    > >> has recently been made available in GitHub under the Apache Software
    > >> License 2.0. The Project currently has explicit support for Spark and
    > >> Hadoop. Project documentation is available on the website www.crail.io.
    > >> There is also a public forum for discussions related to Crail available
    > at
    > >> https://groups.google.com/forum/#!forum/zrlio-users.
    > >>
    > >> Mericrotacy
    > >>
    > >> The current developers are familiar with the meritocratic open source
    > >> development process at Apache. Over the last year, the project has
    > gathered
    > >> interest at GitHub and several companies have already expressed
    > interest in
    > >> the project. We plan to invest in supporting a meritocracy by inviting
    > >> additional developers to participate.
    > >>
    > >> Community
    > >>
    > >> The need for a generic solution to integrate high-performance I/O
    > hardware
    > >> in the open source is tremendous, so there is a potential for a very
    > large
    > >> community. We believe that Crail’s extensible architecture and its
    > >> alignment with the Apache Ecosystem will further encourage community
    > >> participation. We expect that over time Crail will attract a large
    > >> community.
    > >>
    > >> Alignment
    > >>
    > >> Crail is written in Java and is built for the Apache data processing
    > >> ecosystem. The basic storage services of Crail can be used seamlessly
    > from
    > >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
    > data
    > >> processing specific binding, which currently are available only for
    > Spark.
    > >> We think that moving Crail to the Apache incubator will help to extend
    > >> Crail’s support for different data processing frameworks.
    > >>
    > >> Known Risks
    > >>
    > >> To-date, development has been sponsored by IBM and coordinated mostly 
by
    > >> the core team of researchers at the IBM Zurich Research Center. For
    > Crail
    > >> to fully transition to an "Apache Way" governance model, it needs to
    > start
    > >> embracing the meritocracy-centric way of growing the community of
    > >> contributors.
    > >>
    > >> Orphaned Products
    > >>
    > >> The Crail developers have a long-term interest in use and maintenance 
of
    > >> the code and there is also hope that growing a diverse community around
    > the
    > >> project will become a guarantee against the project becoming orphaned.
    > We
    > >> feel that it is also important to put formal governance in place both
    > for
    > >> the project and the contributors as the project expands. We feel ASF is
    > the
    > >> best location for this.
    > >>
    > >> Inexperience with Open Source
    > >>
    > >> Several of the initial committers are experienced open source 
developers
    > >> (Linux Kernel, DPDK, etc.).
    > >>
    > >> Relationships with Other Apache Products
    > >>
    > >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
    > >> designed to integrate with any of the Apache data processing 
frameworks.
    > >>
    > >> Homogeneous Developers
    > >>
    > >> The project already has a diverse developer base including 
contributions
    > >> from organizations and public developers.
    > >>
    > >> An Excessive Fascination with the Apache Brand
    > >>
    > >> Crail solves a real need for a generic approach to leverage modern
    > network
    > >> and storage hardware effectively in the Apache Hadoop and Spark
    > ecosystems.
    > >> Our rationale for developing Crail as an Apache project is detailed in
    > the
    > >> Rationale section. We believe that the Apache brand and community
    > process
    > >> will help to us to engage a larger community and facilitate closer ties
    > >> with various Apache data processing projects.
    > >>
    > >> Documentation
    > >>
    > >> Documentation regarding Crail is available at www.crail.io
    > >>
    > >> Initial Source
    > >>
    > >> Initial source is available on GitHub under the Apache License 2.0:
    > >>
    > >> https://github.com/zrlio/crail
    > >> External Dependencies
    > >>
    > >> Crail is written in Java and currently supports Apache Hadoop MapReduce
    > >> and Apache Spark runtimes. To the best of our knowledge, all
    > dependencies
    > >> of Crail are distributed under Apache compatible licenses.
    > >>
    > >> Required Resource
    > >>
    > >> Mailing lists
    > >>
    > >> priv...@crail.incubator.apache.org
    > >> d...@crail.incubator.apache.org
    > >> comm...@crail.incubator.apache.org
    > >> Git repository
    > >>
    > >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
    > >> Issue Tracking
    > >>
    > >> JIRA (Crail)
    > >> Initial Committers
    > >>
    > >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
    > >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
    > >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
    > >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
    > >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
    > >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
    > >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
    > >> Ana Klimovic <anakli AT stanford DOT edu>
    > >> Yuval Degani <yuvaldeg AT mellanox DOT com>
    > >> Vu Pham <vuhuong AT mellanox DOT com>
    > >> Affiliations
    > >>
    > >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard 
Metzler,
    > >> Michael Kaufmann, Adrian Schuepbach)
    > >> University of New Hampshire (Patrick McArthur)
    > >> Stanford University (Ana Klimovic)
    > >> Mellanox (Yuval Degani, Vu Pham)
    > >> Sponsors
    > >>
    > >> Champion
    > >>
    > >> Luciano Resende <lresende AT apache DOT org>
    > >>
    > >> Nominated Mentors
    > >>
    > >> Luciano Resende <lresende AT apache DOT org>
    > >>
    > >> Raphael Bircher <rbircher AT apache DOT org>
    > >>
    > >> Julian Hyde <jhyde AT apache DOT org>
    > >>
    > >> Sponsoring Entity
    > >>
    > >> We would like to propose the Apache Incubator to sponsor this project.
    > >>
    > >>
    > >> --
    > >> Luciano Resende
    > >> http://twitter.com/lresende1975
    > >> http://lresende.blogspot.com/
    > >>
    > >
    > >
    > >
    > > --
    > > Luciano Resende
    > > http://twitter.com/lresende1975
    > > http://lresende.blogspot.com/
    >
    >
    >
    > --
    > Clebert Suconic
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
    > For additional commands, e-mail: general-h...@incubator.apache.org
    >
    >
    


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Accept Crail into the Apache Incubator

Reply via email to