+1 (Binding) By the way, it is great to see another Scala-based project coming to Apache.
Dick VP Apache ESME On Wed, Jun 29, 2011 at 12:35 PM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > +1 (binding) > Tommaso > > 2011/6/28 Jun Rao <jun...@gmail.com> > >> Hi all, >> >> >> Since the discussion on the thread of the Kafka incubator proposal is >> winding down, I'd like to call a vote. >> >> At the end of this mail, I've put a copy of the current proposal. Here is >> a link to the document in the wiki: >> http://wiki.apache.org/incubator/KafkaProposal >> >> And here is a link to the discussion thread: >> http://www.mail-archive.com/general@incubator.apache.org/msg29594.html >> >> Please cast your votes: >> >> [ ] +1 Accept Kafka for incubation >> [ ] +0 Indifferent to Kafka incubation >> [ ] -1 Reject Kafka for incubation >> >> This vote will close 72 hours from now. >> >> Thanks, >> >> Jun >> >> == Abstract == >> Kafka is a distributed publish-subscribe system for processing large >> amounts >> of streaming data. >> >> == Proposal == >> Kafka provides an extremely high throughput distributed publish/subscribe >> messaging system. Additionally, it supports relatively long term >> persistence of messages to support a wide variety of consumers, >> partitioning >> of the message stream across servers and consumers, and functionality for >> loading data into Apache Hadoop for offline, batch processing. >> >> == Background == >> Kafka was developed at LinkedIn to process the large amounts of events >> generated by that company's website and provide a common repository for >> many >> types of consumers to access and process those events. Kafka has been used >> in production at LinkedIn scale to handle dozens of types of events >> including page views, searches and social network activity. Kafka clusters >> at LinkedIn currently process more than two billion events per day. >> >> Kafka fills the gap between messaging systems such as Apache ActiveMQ, >> which >> provide low latency message delivery but don't focus on throughput, and log >> processing systems such as Scribe and Flume, which do not provide adequate >> latency for our diverse set of consumers. Kafka can also be inserted into >> traditional log-processing systems, acting as an intermediate step before >> further processing. Kafka focuses relentlessly on performance and >> throughput >> by not introspecting into message content, nor indexing them on the broker. >> We also achieve high performance by depending on Java's >> sendFile/transferTo >> capabilities to minimize intermediate buffer copies and relying on the OS's >> pagecache to efficiently serve up message contents to consumers. Kafka is >> also designed to be scalable and it depends on Apache ZooKeeper for >> coordination amongst its producers, brokers and consumers. >> >> Kafka is written in Scala. It was developed internally at LinkedIn to meet >> our particular use cases, but will be useful to many organizations facing a >> similar need to reliably process large amounts of streaming data. >> Therefore, we would like to share it the ASF and begin developing a >> community of developers and users within Apache. >> >> == Rationale == >> Many organizations can benefit from a reliable stream processing system >> such >> as Kafka. While our use case of processing events from a very large >> website >> like LinkedIn has driven the design of Kafka, its uses are varied and we >> expect many new use cases to emerge. Kafka provides a natural bridge >> between near real-time event processing and offline batch processing and >> will appeal to many users. >> >> == Current Status == >> === Meritocracy === >> Our intent with this incubator proposal is to start building a diverse >> developer community around Kafka following the Apache meritocracy model. >> Since Kafka was open sourced we have solicited contributions via the >> website >> and presentations given to user groups and technical audiences. We have >> had >> positive responses to these and have received several contributions and >> clients for other languages. We plan to continue this support for new >> contributors and work with those who contribute significantly to the >> project >> to make them committers. >> >> === Community === >> Kafka is currently being used by developed by engineers within LinkedIn and >> used in production in that company. Additionally, we have active users in >> or >> have received contributions from a diverse set of companies including >> MediaSift, SocialTwist, Clearspring and Urban Airship. Recent public >> presentations of Kafka and its goals garnered much interest from potential >> contributors. We hope to extend our contributor base significantly and >> invite all those who are interested in building high-throughput distributed >> systems to participate. We have begun receiving contributions from outside >> of LinkedIn, including clients for several languages including Ruby, PHP, >> Clojure, .NET and Python. >> >> To further this goal, we use GitHub issue tracking and branching >> facilities, >> as well as maintaining a public mailing list via Google Groups. >> >> === Core Developers === >> Kafka is currently being developed by four engineers at LinkedIn: Neha >> Narkhede, Jun Rao, Jakob Homan and Jay Kreps. Jun has experience within >> Apache as a Cassandra committer and PMC member. Neha has been an active >> contributor to several projects LinkedIn has open sourced, including Bobo, >> Sensei and Zoie. Jay has experience with open source software as the >> originator of the Project Voldemort project, as well as being active within >> the Hadoop ecosystem community. Jakob is an Apache Hadoop committer and PMC >> and previous Apache ZooKeeper contributor. >> >> === Alignment === >> The ASF is the natural choice to host the Kafka project as its goal of >> encouraging community-driven open-source projects fits with our vision for >> Kafka. Additionally, many other projects with which we are familiar with >> and expect Kafka to integrate with, such as Apache Hadoop, Pig, ZooKeeper >> and log4j are hosted by the ASF and we will benefit and provide benefit by >> close proximity to them. >> >> == Known Risks == >> === Orphaned Products === >> The core developers plan to work full time on the project. There is very >> little risk of Kafka being abandoned as it is a critical part of LinkedIn's >> internal infrastructure and is in production use. >> >> === Inexperience with Open Source === >> All of the core developers have experience with open source development. >> LinkedIn open sourced Kafka several months ago and has been receiving >> contributions since. Jun is an Apache Cassandra committer and PMC member. >> Jay and Neha have been involved with several open source projects released >> by LinkedIn. Jakob has been actively involved with the ASF as a full-time >> Hadoop committer and PMC member. >> >> === Homogeneous Developers === >> The current core developers are all from LinkedIn. However, we hope to >> establish a developer community that includes contributors from several >> corporations and we actively encouraging new contributors via the mailing >> lists and public presentations of Kafka. >> >> === Reliance on Salaried Developers === >> Currently, the developers are paid to do work on Kafka. However, once the >> project has a community built around it, we expect to get committers, >> developers and community from outside the current core developers. However, >> because LinkedIn relies on Kafka internally, the reliance on salaried >> developers is unlikely to change. >> >> === Relationships with Other Apache Products === >> Kafka is deeply integrated with Apache products. Kafka uses Apache >> ZooKeeper >> to coordinate its state amongst the brokers, consumers, and soon, the >> producers. Kafka provides input formats to allow Hadoop MapReduce to load >> data directly from Kafka. Kafka provides an appender to allow consuming >> data directly from Apache log4j. >> >> === An Excessive Fascination with the Apache Brand === >> While we respect the reputation of the Apache brand and have no doubts that >> it will attract contributors and users, our interest is primarily to give >> Kafka a solid home as an open source project following an established >> development model. We have also given reasons in the Rationale and >> Alignment >> sections. >> >> == Documentation == >> Information about Kafka can be found at [http://sna-projects.com/kafka/] >> The >> following links provide more information about the project: >> >> * Kafka roadmap and goals: [http://sna-projects.com/kafka/projects.php] >> * The GitHub site: [https://github.com/kafka-dev/kafka] >> * Kafka overview from Jay Kreps: [ >> http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation] >> * Kafka overview from Jakob Homan: [http://bit.ly/fLmoZz] >> * Kafka paper at NetDB 2011: [ >> >> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf >> ] >> >> == Initial Source == >> Kafka has been under development at LinkedIn since November 2009. It was >> open sourced by LinkedIn in January 2011. It is currently hosted on github >> under the Apache license at [https://github.com/kafka-dev/kafka] >> >> Kafka is mainly written in Scala with some performance testing code in >> Java. >> Several clients have been contributed in other languages, including Ruby, >> PHP, Clojure, .NET and Python. Its source tree is entirely self contained >> and relies of simple build tool (sbt) as its build system and dependency >> resolution mechanism. >> >> == External Dependencies == >> The dependencies all have Apache compatible licenses. >> >> == Cryptography == >> Not applicable. >> >> == Required Resources == >> === Mailing Lists === >> * kafka-private for private PMC discussions (with moderated subscriptions) >> * kafka-dev >> * kafka-commits >> * kafka-user >> >> === Subversion Directory === >> [https://svn.apache.org/repos/asf/incubator/kafka] >> >> === Issue Tracking === >> JIRA Kafka (KAFKA) >> >> === Other Resources === >> The existing code already has unit tests, so we would like a Hudson >> instance >> to run them whenever a new patch is submitted. This can be added after >> project creation. >> >> == Initial Committers == >> * Jay Kreps >> * Jun Rao >> * Neha Narkhede >> * Jakob Homan >> * Phillip Rhodes >> * Henry Saputra >> * Chris Burroughs >> >> == Affiliations == >> * Jay Kreps (LinkedIn) >> * Jun Rao (LinkedIn) >> * Neha Narkhede (LinkedIn) >> * Jakob Homan (LinkedIn) >> * Phillip Rhodes (Fogbeam Labs) >> * Henry Saputra (Cisco Systems) >> * Chris Burroughs (Clearspring Technologies) >> >> == Sponsors == >> === Champion === >> Chris Douglas (Apache Member) >> >> === Nominated Mentors === >> * Alan Cabrera (Apache Member) >> * Geir Magnusson, Jr. (Apache Member and Director) >> * Owen O'Malley (Apache Member) >> >> === Sponsoring Entity === >> We are requesting the Incubator to sponsor this project. >> > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org