+1 ! - milind
On 9/9/11 9:22 AM, "Doug Cutting" <cutt...@apache.org> wrote: >It's been a week since the Accumulo proposal was submitted for >discussion. A few questions were asked, and the proposal was clarified >in response. Sufficient mentors have volunteered. I thus feel we are >now ready for a vote. > >The latest proposal can be found at the end of this email and at: > > http://wiki.apache.org/incubator/AccumuloProposal > >The discussion regarding the proposal can be found at: > > http://s.apache.org/oi > >Please cast your votes: > >[ ] +1 Accept Accumulo for incubation >[ ] +0 Indifferent to Accumulo incubation >[ ] -1 Reject Accumulo for incubation > >This vote will close 72 hours from now. > >Thanks, > >Doug > >----------------------- > >= Accumulo Proposal = > >== Abstract == >Accumulo is a distributed key/value store that provides expressive, >cell-level access labels. > >== Proposal == >Accumulo is a sorted, distributed key/value store based on Google's >BigTable design. It is built on top of Apache Hadoop, Zookeeper, and >Thrift. It features a few novel improvements on the BigTable design in >the form of cell-level access labels and a server-side programming >mechanism that can modify key/value pairs at various points in the data >management process. > >== Background == >Google published the design of BigTable in 2006. Several other open >source projects have implemented aspects of this design including HBase, >CloudStore, and Cassandra. Accumulo began its development in 2008. > >== Rationale == >There is a need for a flexible, high performance distributed key/value >store that provides expressive, fine-grained access labels. The >communities we expect to be most interested in such a project are >government, health care, and other industries where privacy is a >concern. We have made much progress in developing this project over the >past 3 years and believe both the project and the interested communities >would benefit from this work being openly available and having open >development. > >== Current Status == > >=== Meritocracy === >We intend to strongly encourage the community to help with and >contribute to the code. We will actively seek potential committers and >help them become familiar with the codebase. > >=== Community === >A strong government community has developed around Accumulo and training >classes have been ongoing for about a year. Hundreds of developers use >Accumulo. > >=== Core Developers === >The developers are mainly employed by the National Security Agency, but >we anticipate interest developing among other companies. > >=== Alignment === >Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds >with Maven. Due to the strong relationship with these Apache projects, >the incubator is a good match for Accumulo. > >== Known Risks == >=== Orphaned Products === >There is only a small risk of being orphaned. The community is >committed to improving the codebase of the project due to its fulfilling >needs not addressed by any other software. > >=== Inexperience with Open Source === >The codebase has been treated internally as an open source project since >its beginning, and the initial Apache committers have been involved with >the code for multiple years. While our experience with public open >source is limited, we do not anticipate difficulty in operating under >Apache's development process. > >=== Homogeneous Developers === >The committers have multiple employers and it is expected that >committers from different companies will be recruited. > >=== Reliance on Salaried Developers === >The initial committers are all paid by their employers to work on >Accumulo and we expect such employment to continue. Some of the initial >committers would continue as volunteers even if no longer employed to do >so. > >=== Relationships with Other Apache Products === >Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, >-net, -io, -jci, -collections, -configuration, -logging, and -codec. > >=== Relationship to HBase === >Accumulo and HBase are both based on the design of Google's BigTable, so >there is a danger that potential users will have difficulty >distinguishing the two. Some of the key areas in which Accumulo differs >from HBase are discussed below. It may be possible to incorporate the >desired features of Accumulo into HBase. However, the amount of work >required would slow development of HBase and Accumulo considerably. We >believe this warrants a podling for Accumulo at the current time. We >expect active cross-pollination will occur between HBase and podling >Accumulo and it is possible that the codebases and projects will >ultimately converge. > >==== Access Labels ==== >Accumulo has an additional portion of its key that sorts after the >column qualifier and before the timestamp. It is called column >visibility and enables expressive cell-level access control. >Authorizations are passed with each query to control what data is >returned to the user. The column visibilities are boolean AND and OR >combinations of arbitrary strings (such as "(A&B)|C") and authorizations >are sets of strings (such as {C,D}). > >==== Iterators ==== >Accumulo has a novel server-side programming mechanism that can modify >the data written to disk or returned to the user. This mechanism can be >configured for any of the scopes where data is read from or written to >disk. It can be used to perform joins on data within a single tablet. > >==== Flexibility ==== >HBase requires the user to specify the set of column families to be used >up front. Accumulo places no restrictions on the column families. >Also, each column family in HBase is stored separately on disk. >Accumulo allows column families to be grouped together on disk, as does >BigTable. This enables users to configure how their data is stored, >potentially providing improvements in compression and lookup speeds. It >gives Accumulo a row/column hybrid nature, while HBase is currently >column-oriented. > >==== Testing ==== >Accumulo has testing frameworks that have resulted in its achieving a >high level of correctness and performance. We have observed that under >some configurations and conditions Accumulo will outperform HBase and >provide greater data integrity. > >==== Logging ==== >HBase uses a write-ahead log on the Hadoop Distributed File System. >Accumulo has its own logging service that does not depend on >communication with the HDFS NameNode. > >==== Storage ==== >Accumulo has a relative key file format that improves compression. > >==== Areas in which HBase features improvements over Accumulo ==== >in memory tables, upserts, coprocessors, connections to other projects >such as Cascading and Pig > >=== Expectations === >There is a risk that Accumulo will be criticized for not providing >adequate security. The access labels in Accumulo do not in themselves >provide a complete security solution, but are a mechanism for labeling >each piece of data with the authorizations that are necessary to see it. > >=== Apache Brand === >Our interest in releasing this code as an Apache incubator project is >due to its strong relationship with other Apache projects, i.e. Accumulo >has dependencies on Hadoop, Zookeeper, and Thrift and has complementary >goals to HBase. > >== Documentation == >There is not currently documentation about Accumulo on the web, but a >fair amount of documentation and training materials exists and will be >provided on the Accumulo wiki at apache.org. Also, a paper discussing >YCSB results for Accumulo will be presented at the 2011 Symposium on >Cloud Computing. > >== Initial Source == >Accumulo has been in development since spring 2008. There are hundreds >of developers using it and tens of developers have contributed to it. >The core codebase consists of 200,000 lines of code (mainly Java) and >100s of pages of documentation. There are also a few projects built on >top of Accumulo that may be added to its contrib in the future. These >include support for Hive, Matlab, YCSB, and graph processing. > >== Source and Intellectual Property Submission Plan == >Accumulo core code, examples, documention, and training materials will >be submitted by the National Security Agency. > >We will also be soliciting contributions of further plugins from MIT >Lincoln Labs, Carnegie Mellon University, and others. > >Accumulo has been developed by a mix of government employees and private >companies under government contract. Material developed by government >employees is in the public domain and no U.S. copyright exists in works >of the federal government. For the contractor developed material in the >initial submission, the U.S. Government has sufficient authority per the >ICLA from the copyright owner to contribute the Accumulo code to the >incubator. > >There has been some discussion regarding accepting contributions from US >Government sources on https://issues.apache.org/jira/browse/LEGAL-93. We >propose that the NSA will sign an ICLA/CCLA if that document could be >slightly modified to explicitly address copyright in works of government >employees. Specifically, we propose that the definition of ³You² be >modified to include ³the copyright owner, the owner of a Contribution >not subject to copyright, or legal entity authorized by the copyright >owner that is making this Agreement.² In addition, section 2, the >copyright license grant be modified after ³You hereby grant² that either >states ³to the extent authorized by law² or ³to the extent copyright >exists in the Contribution.² These changes will permit US Government >employee developed work to be included. > >One proposed solution is to form a Collaborative Research and >Development Agreement (CRADA) between the Apache Software Foundation and >the US Government, but this will not solve the underlying problem that >U.S. law does not grant copyright to works of government employees. At >this time a CRADA is not necessary but should it be determined that a >CRADA is necessary, we would like to work through that process during >the incubation phase of Accumulo rather than before acceptance as this >may take time to enter into an agreement. > >== External Dependencies == >jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon (LGPL), >slf4j (MIT), junit (CPL) > >== Cryptography == >none > >== Required Resources == > * Mailing Lists > * accumulo-private > * accumulo-dev > * accumulo-commits > * accumulo-user > > * Subversion Directory > * https://svn.apache.org/repos/asf/incubator/accumulo > > * Issue Tracking > * JIRA Accumulo (ACCUMULO) > > * Continuous Integration > * Jenkins builds on https://builds.apache.org/ > > * Web > * http://incubator.apache.org/accumulo/ > * wiki at http://wiki.apache.org or http://cwiki.apache.org > >== Initial Committers == > * Aaron Cordova (aaron at cordovas dot org) > * Adam Fuchs (adam.p.fuchs at ugov dot gov) > * Eric Newton (ecn at swcomplete dot com) > * Billie Rinaldi (billie.j.rinaldi at ugov dot gov) > * Keith Turner (keith.turner at ptech-llc dot com) > * John Vines (john.w.vines at ugov dot gov) > * Chris Waring (christopher.a.waring at ugov dot gov) > >== Affiliations == > * Aaron Cordova, The Interllective > * Adam Fuchs, National Security Agency > * Eric Newton, SW Complete Incorporated > * Billie Rinaldi, National Security Agency > * Keith Turner, Peterson Technology LLC > * John Vines, National Security Agency > * Chris Waring, National Security Agency > >== Sponsors == > * Champion: Doug Cutting > >== Nominated Mentors == > * Benson Margulies > * Alan Cabrera > * Bernd Fondermann > * Owen O'Malley > >== Sponsoring Entity == > * Apache Incubator > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org