+1 On Thu, Oct 26, 2017 at 12:01 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > Off course, my + 1 > > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> Now that the discussion thread on the Crail proposal has ended, please >> vote on accepting Crail into into the Apache Incubator. >> >> The ASF voting rules are described at: >> http://www.apache.org/foundation/voting.html >> >> A vote for accepting a new Apache Incubator podling is a majority vote >> for which only Incubator PMC member votes are binding. >> >> Votes from other people are also welcome as an indication of peoples >> enthusiasm (or lack thereof). >> >> Please do not use this VOTE thread for discussions. >> If needed, start a new thread instead. >> >> This vote will run for at least 72 hours. Please VOTE as follows >> [] +1 Accept Crail into the Apache Incubator >> [] +0 Abstain. >> [] -1 Do not accept Crail into the Apache Incubator because ... >> >> The proposal below is also on the wiki: >> https://wiki.apache.org/incubator/CrailProposal >> >> === >> >> Abstract >> >> Crail is a storage platform for sharing performance critical data in >> distributed data processing jobs at very high speed. Crail is built >> entirely upon principles of user-level I/O and specifically targets data >> center deployments with fast network and storage hardware (e.g., 100Gbps >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation >> such resource disaggregation or serverless computing. Crail is written in >> Java and integrates seamlessly with the Apache data processing ecosystem. >> It can be used as a backbone to accelerate high-level data operations such >> as shuffle or broadcast, or as a cache to store hot data that is queried >> repeatedly, or as a storage platform for sharing inter-job data in complex >> multi-job pipelines, etc. >> >> Proposal >> >> Crail enables Apache data processing frameworks to run efficiently in next >> generation data centers using fast storage and network hardware in >> combination with resource (e.g., DRAM, Flash) disaggregation. >> >> Background >> >> Crail started as a research project at the IBM Zurich Research Laboratory >> around 2014 aiming to integrate high-speed I/O hardware effectively into >> large scale data processing systems. >> >> Rational >> >> During the last decade, I/O hardware has undergone rapid performance >> improvements, typically in the order of magnitudes. Modern day networking >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few >> microseconds of access latencies. However, despite such progress in raw I/O >> performance, effectively leveraging modern hardware in data processing >> frameworks remains challenging. In most of the cases, upgrading to high-end >> networking or storage hardware has very little effect on the performance of >> analytics workloads. The problem comes from heavily layered software >> imposing overheads such as deep call stacks, unnecessary data copies, >> thread contention, etc. These problems have already been addressed at the >> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc., >> allowing applications to bypass software layers during I/O operations. >> Distributed data processing frameworks on the other hand, are typically >> implemented on legacy I/O interfaces such as such as sockets or block >> storage. These interfaces have been shown to be insufficient to deliver the >> full hardware performance. Yet, to the best of our knowledge, there are no >> active and systematic efforts to integrate these new user level I/O APIs >> into Apache software frameworks. This problem affects all end-users and >> organizations that use Apache software. We expect them to see >> unsatisfactory small performance gains when upgrading their networking and >> storage hardware. >> >> Crail solves this problem by providing an efficient storage platform built >> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O >> operations. Moreover, Crail directly leverages the specific hardware >> features of RDMA and NVMe to provide a better integration with high-level >> data operations in Apache compute frameworks. As a consequence, Crail >> enables users to run larger, more complex queries against ever increasing >> amounts of data at a speed largely determined by the deployed hardware. >> Crail is generic solution that integrates well with the Apache ecosystem >> including frameworks like Spark, Hadoop, Hive, etc. >> >> Initial Goals >> >> The initial goals to move Crail to the Apache Incubator is to broaden the >> community, and foster contributions from developers to leverage Crail in >> various data processing frameworks and workloads. Ultimately, the goal for >> Crail is to become the de-facto standard platform for storing temporary >> performance critical data in distributed data processing systems. >> >> Current Status >> >> The initial code has been developed at the IBM Zurich Research Center and >> has recently been made available in GitHub under the Apache Software >> License 2.0. The Project currently has explicit support for Spark and >> Hadoop. Project documentation is available on the website www.crail.io. >> There is also a public forum for discussions related to Crail available at >> https://groups.google.com/forum/#!forum/zrlio-users. >> >> Mericrotacy >> >> The current developers are familiar with the meritocratic open source >> development process at Apache. Over the last year, the project has gathered >> interest at GitHub and several companies have already expressed interest in >> the project. We plan to invest in supporting a meritocracy by inviting >> additional developers to participate. >> >> Community >> >> The need for a generic solution to integrate high-performance I/O hardware >> in the open source is tremendous, so there is a potential for a very large >> community. We believe that Crail’s extensible architecture and its >> alignment with the Apache Ecosystem will further encourage community >> participation. We expect that over time Crail will attract a large >> community. >> >> Alignment >> >> Crail is written in Java and is built for the Apache data processing >> ecosystem. The basic storage services of Crail can be used seamlessly from >> Spark, Hadoop, Storm. The enhanced storage services require dedicated data >> processing specific binding, which currently are available only for Spark. >> We think that moving Crail to the Apache incubator will help to extend >> Crail’s support for different data processing frameworks. >> >> Known Risks >> >> To-date, development has been sponsored by IBM and coordinated mostly by >> the core team of researchers at the IBM Zurich Research Center. For Crail >> to fully transition to an "Apache Way" governance model, it needs to start >> embracing the meritocracy-centric way of growing the community of >> contributors. >> >> Orphaned Products >> >> The Crail developers have a long-term interest in use and maintenance of >> the code and there is also hope that growing a diverse community around the >> project will become a guarantee against the project becoming orphaned. We >> feel that it is also important to put formal governance in place both for >> the project and the contributors as the project expands. We feel ASF is the >> best location for this. >> >> Inexperience with Open Source >> >> Several of the initial committers are experienced open source developers >> (Linux Kernel, DPDK, etc.). >> >> Relationships with Other Apache Products >> >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is >> designed to integrate with any of the Apache data processing frameworks. >> >> Homogeneous Developers >> >> The project already has a diverse developer base including contributions >> from organizations and public developers. >> >> An Excessive Fascination with the Apache Brand >> >> Crail solves a real need for a generic approach to leverage modern network >> and storage hardware effectively in the Apache Hadoop and Spark ecosystems. >> Our rationale for developing Crail as an Apache project is detailed in the >> Rationale section. We believe that the Apache brand and community process >> will help to us to engage a larger community and facilitate closer ties >> with various Apache data processing projects. >> >> Documentation >> >> Documentation regarding Crail is available at www.crail.io >> >> Initial Source >> >> Initial source is available on GitHub under the Apache License 2.0: >> >> https://github.com/zrlio/crail >> External Dependencies >> >> Crail is written in Java and currently supports Apache Hadoop MapReduce >> and Apache Spark runtimes. To the best of our knowledge, all dependencies >> of Crail are distributed under Apache compatible licenses. >> >> Required Resource >> >> Mailing lists >> >> priv...@crail.incubator.apache.org >> d...@crail.incubator.apache.org >> comm...@crail.incubator.apache.org >> Git repository >> >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git >> Issue Tracking >> >> JIRA (Crail) >> Initial Committers >> >> Patrick Stuedi <stu AT ibm DOT zurich DOT com> >> Animesh Trivedi <atr AT ibm DOT zurich DOT com> >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com> >> Bernard Metzler <bmt AT ibm DOT zurich DOT com> >> Michael Kaufmann <kau AT ibm DOT zurich DOT com> >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com> >> Patrick McArthur <patrick AT patrickmcarthur DOT net> >> Ana Klimovic <anakli AT stanford DOT edu> >> Yuval Degani <yuvaldeg AT mellanox DOT com> >> Vu Pham <vuhuong AT mellanox DOT com> >> Affiliations >> >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler, >> Michael Kaufmann, Adrian Schuepbach) >> University of New Hampshire (Patrick McArthur) >> Stanford University (Ana Klimovic) >> Mellanox (Yuval Degani, Vu Pham) >> Sponsors >> >> Champion >> >> Luciano Resende <lresende AT apache DOT org> >> >> Nominated Mentors >> >> Luciano Resende <lresende AT apache DOT org> >> >> Raphael Bircher <rbircher AT apache DOT org> >> >> Julian Hyde <jhyde AT apache DOT org> >> >> Sponsoring Entity >> >> We would like to propose the Apache Incubator to sponsor this project. >> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
-- Clebert Suconic --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org