Off course, my + 1 On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> Now that the discussion thread on the Crail proposal has ended, please > vote on accepting Crail into into the Apache Incubator. > > The ASF voting rules are described at: > http://www.apache.org/foundation/voting.html > > A vote for accepting a new Apache Incubator podling is a majority vote > for which only Incubator PMC member votes are binding. > > Votes from other people are also welcome as an indication of peoples > enthusiasm (or lack thereof). > > Please do not use this VOTE thread for discussions. > If needed, start a new thread instead. > > This vote will run for at least 72 hours. Please VOTE as follows > [] +1 Accept Crail into the Apache Incubator > [] +0 Abstain. > [] -1 Do not accept Crail into the Apache Incubator because ... > > The proposal below is also on the wiki: > https://wiki.apache.org/incubator/CrailProposal > > === > > Abstract > > Crail is a storage platform for sharing performance critical data in > distributed data processing jobs at very high speed. Crail is built > entirely upon principles of user-level I/O and specifically targets data > center deployments with fast network and storage hardware (e.g., 100Gbps > RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation > such resource disaggregation or serverless computing. Crail is written in > Java and integrates seamlessly with the Apache data processing ecosystem. > It can be used as a backbone to accelerate high-level data operations such > as shuffle or broadcast, or as a cache to store hot data that is queried > repeatedly, or as a storage platform for sharing inter-job data in complex > multi-job pipelines, etc. > > Proposal > > Crail enables Apache data processing frameworks to run efficiently in next > generation data centers using fast storage and network hardware in > combination with resource (e.g., DRAM, Flash) disaggregation. > > Background > > Crail started as a research project at the IBM Zurich Research Laboratory > around 2014 aiming to integrate high-speed I/O hardware effectively into > large scale data processing systems. > > Rational > > During the last decade, I/O hardware has undergone rapid performance > improvements, typically in the order of magnitudes. Modern day networking > and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few > microseconds of access latencies. However, despite such progress in raw I/O > performance, effectively leveraging modern hardware in data processing > frameworks remains challenging. In most of the cases, upgrading to high-end > networking or storage hardware has very little effect on the performance of > analytics workloads. The problem comes from heavily layered software > imposing overheads such as deep call stacks, unnecessary data copies, > thread contention, etc. These problems have already been addressed at the > operating system level with new I/O APIs such as RDMA verbs, NVMe, etc., > allowing applications to bypass software layers during I/O operations. > Distributed data processing frameworks on the other hand, are typically > implemented on legacy I/O interfaces such as such as sockets or block > storage. These interfaces have been shown to be insufficient to deliver the > full hardware performance. Yet, to the best of our knowledge, there are no > active and systematic efforts to integrate these new user level I/O APIs > into Apache software frameworks. This problem affects all end-users and > organizations that use Apache software. We expect them to see > unsatisfactory small performance gains when upgrading their networking and > storage hardware. > > Crail solves this problem by providing an efficient storage platform built > upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O > operations. Moreover, Crail directly leverages the specific hardware > features of RDMA and NVMe to provide a better integration with high-level > data operations in Apache compute frameworks. As a consequence, Crail > enables users to run larger, more complex queries against ever increasing > amounts of data at a speed largely determined by the deployed hardware. > Crail is generic solution that integrates well with the Apache ecosystem > including frameworks like Spark, Hadoop, Hive, etc. > > Initial Goals > > The initial goals to move Crail to the Apache Incubator is to broaden the > community, and foster contributions from developers to leverage Crail in > various data processing frameworks and workloads. Ultimately, the goal for > Crail is to become the de-facto standard platform for storing temporary > performance critical data in distributed data processing systems. > > Current Status > > The initial code has been developed at the IBM Zurich Research Center and > has recently been made available in GitHub under the Apache Software > License 2.0. The Project currently has explicit support for Spark and > Hadoop. Project documentation is available on the website www.crail.io. > There is also a public forum for discussions related to Crail available at > https://groups.google.com/forum/#!forum/zrlio-users. > > Mericrotacy > > The current developers are familiar with the meritocratic open source > development process at Apache. Over the last year, the project has gathered > interest at GitHub and several companies have already expressed interest in > the project. We plan to invest in supporting a meritocracy by inviting > additional developers to participate. > > Community > > The need for a generic solution to integrate high-performance I/O hardware > in the open source is tremendous, so there is a potential for a very large > community. We believe that Crail’s extensible architecture and its > alignment with the Apache Ecosystem will further encourage community > participation. We expect that over time Crail will attract a large > community. > > Alignment > > Crail is written in Java and is built for the Apache data processing > ecosystem. The basic storage services of Crail can be used seamlessly from > Spark, Hadoop, Storm. The enhanced storage services require dedicated data > processing specific binding, which currently are available only for Spark. > We think that moving Crail to the Apache incubator will help to extend > Crail’s support for different data processing frameworks. > > Known Risks > > To-date, development has been sponsored by IBM and coordinated mostly by > the core team of researchers at the IBM Zurich Research Center. For Crail > to fully transition to an "Apache Way" governance model, it needs to start > embracing the meritocracy-centric way of growing the community of > contributors. > > Orphaned Products > > The Crail developers have a long-term interest in use and maintenance of > the code and there is also hope that growing a diverse community around the > project will become a guarantee against the project becoming orphaned. We > feel that it is also important to put formal governance in place both for > the project and the contributors as the project expands. We feel ASF is the > best location for this. > > Inexperience with Open Source > > Several of the initial committers are experienced open source developers > (Linux Kernel, DPDK, etc.). > > Relationships with Other Apache Products > > As of now, Crail has been tested with Spark, Hadoop and Hive, but it is > designed to integrate with any of the Apache data processing frameworks. > > Homogeneous Developers > > The project already has a diverse developer base including contributions > from organizations and public developers. > > An Excessive Fascination with the Apache Brand > > Crail solves a real need for a generic approach to leverage modern network > and storage hardware effectively in the Apache Hadoop and Spark ecosystems. > Our rationale for developing Crail as an Apache project is detailed in the > Rationale section. We believe that the Apache brand and community process > will help to us to engage a larger community and facilitate closer ties > with various Apache data processing projects. > > Documentation > > Documentation regarding Crail is available at www.crail.io > > Initial Source > > Initial source is available on GitHub under the Apache License 2.0: > > https://github.com/zrlio/crail > External Dependencies > > Crail is written in Java and currently supports Apache Hadoop MapReduce > and Apache Spark runtimes. To the best of our knowledge, all dependencies > of Crail are distributed under Apache compatible licenses. > > Required Resource > > Mailing lists > > priv...@crail.incubator.apache.org > d...@crail.incubator.apache.org > comm...@crail.incubator.apache.org > Git repository > > https://git-wip-us.apache.org/repos/asf/incubator-crail.git > Issue Tracking > > JIRA (Crail) > Initial Committers > > Patrick Stuedi <stu AT ibm DOT zurich DOT com> > Animesh Trivedi <atr AT ibm DOT zurich DOT com> > Jonas Pfefferle <jpf AT ibm DOT zurich DOT com> > Bernard Metzler <bmt AT ibm DOT zurich DOT com> > Michael Kaufmann <kau AT ibm DOT zurich DOT com> > Adrian Schuepbach <dri AT ibm DOT zurich DOT com> > Patrick McArthur <patrick AT patrickmcarthur DOT net> > Ana Klimovic <anakli AT stanford DOT edu> > Yuval Degani <yuvaldeg AT mellanox DOT com> > Vu Pham <vuhuong AT mellanox DOT com> > Affiliations > > IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler, > Michael Kaufmann, Adrian Schuepbach) > University of New Hampshire (Patrick McArthur) > Stanford University (Ana Klimovic) > Mellanox (Yuval Degani, Vu Pham) > Sponsors > > Champion > > Luciano Resende <lresende AT apache DOT org> > > Nominated Mentors > > Luciano Resende <lresende AT apache DOT org> > > Raphael Bircher <rbircher AT apache DOT org> > > Julian Hyde <jhyde AT apache DOT org> > > Sponsoring Entity > > We would like to propose the Apache Incubator to sponsor this project. > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/