+1 -C
On Sat, Mar 19, 2016 at 7:26 AM, Flavio Junqueira <f...@apache.org> wrote: > I understand the concern, so let me try to offer some facts and see if we can > make progress from there. > > Omid has been around for some time now, and its initial design appeared in a > couple of research papers that I actually co-authored. The architecture is > based on the idea of having a centralized transaction status oracle that > shares transaction status data with clients for scalability. The current Omid > project evolved out of that initial work and it is a much improved version > over that first iteration, with the improvements focusing on scalability. It > currently runs in production at scale at Yahoo! and there is interest from > other companies according to the proposal. There is a series of blog posts > about the experience in the project proposal. > > Tephra has a very similar architecture. The description here says that it has > a transaction server, which sounds like the TSO in the original Omid papers. > I haven't spent enough time understanding the precise protocol they use, but > I must say that the protocol is very important for correctness and > scalability. Having two protocols with different properties could justify the > presence of two projects, but they both promise snapshot isolation so I > suspect they will be doing very similar things. > > Overall, as I see it, it would be very unfair to reject the Omid proposal on > the basis that Tephra was incubated a couple of weeks ago. I'd much rather > see how the two communities evolve and have the mentors of the projects > fostering collaboration and possibly a merge of the two projects before > graduation. Why not think of a general transaction status oracle with > different protocol implementations assuming it makes sense? I wouldn't like > to see any of the two blocked upfront on the basis that they are in the same > space, though. We could postpone this decision until graduation when we'll > have more knowledge about the projects and the growth of the two communities. > > -Flavio > >> On 18 Mar 2016, at 23:19, Henry Saputra <henry.sapu...@gmail.com> wrote: >> >> I know Apache incubator does not play favorite but it is getting awkward >> that TWO transaction engine for HBase coming to incubator at the same time. >> >> As most people know, the other one is Tephra, that just coming to incubator >> few weeks ago. >> >> As member of IPMC, I would like to see Omid provide some more details >> comparisons about the difference that the project bring, in term of >> approach and possible integrations with other ASF projects. >> >> If possible, I would prefer to see Omid team work together with Tephra to >> work on working together to make one solid transaction engine for HBase and >> later NoSQL databases. >> >> >> - Henry >> >> On Thu, Mar 17, 2016 at 1:17 PM, Daniel Dai <dai...@gmail.com> wrote: >> >>> Hi, >>> >>> I would like to propose Omid as an Apache Incubator project: >>> >>> https://wiki.apache.org/incubator/OmidProposal >>> >>> I've posted posted the text of the proposal below: >>> >>> Thanks, >>> Daniel >>> >>> = Omid Proposal = >>> >>> === Abstract === >>> >>> Omid is a flexible, reliable, high performant and scalable ACID >>> transactional framework that allows client applications to execute >>> transactions on top of MVCC key/value-based NoSQL datastores >>> (currently Apache HBase) providing Snapshot Isolation guarantees on >>> the accessed data. >>> >>> >>> === Proposal === >>> >>> Omid is a flexible open-source transactional framework that provides >>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL >>> datastores. In particular, the current codebase brings the concept of >>> transactions to the popular Apache HBase datastore. Omid offers great >>> performance, it is highly available, and scalable. Omid's current >>> version is able to scale to thousands of clients triggering concurrent >>> transactions on application data stored in HBase. Omid can scale >>> beyond 100K transactions per second on mid-range hardware while >>> incurring in a minimal impact on the speed of data access in the >>> datastore. We’re currently experimenting with a prototype version that >>> can improve the performance up to ~380K TPS. >>> >>> >>> Omid has been publicly available as an open-source project in Github >>> under Apache License Version 2.0 since 2011 [1]. During these years, >>> it has generated certain interest in the open source community, >>> especially since the public presentation of the first version in >>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and >>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software >>> Foundation with the aim to transfer the Omid project -including its >>> source code and documentation- to Apache in order to start the build >>> of a stable open source community around it. >>> >>> >>> [1] https://github.com/yahoo/omid >>> >>> [2] Omid presentation at Hadoop Summit 2013: >>> >>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus >>> >>> >>> === Background === >>> >>> An Omid prototype was first released as an open-source project back in >>> 2011. Inspired by Google Percolator [1], it offered a lock-free >>> approach to transactions in NoSQL datastores (See [2]). However, >>> during these years, the design of Omid has evolved significantly. >>> Whilst the current open-sourced version maintains many aspects of the >>> original implementation, it is the result of a major redesign of the >>> first prototype released in 2011. >>> >>> >>> Omid has now a more decentralized design that does not sacrifice the >>> consistency and performance of the original version. The current >>> design also enables Omid to scale to thousands of clients executing >>> transactions concurrently on application data stored in HBase. >>> Internally, Omid still utilizes a lock-free approach to support >>> multiple concurrent clients. Its design also relies on a centralized >>> conflict detection component, the TSO, which now resolves in an >>> efficient manner writeset collisions among concurrent transactions >>> without having to piggyback commit information to the clients. Another >>> important benefit of Omid is that it doesn't require any modification >>> of the underlying key-value datastore, HBase in this case. Moreover, >>> the recently added high availability algorithm allows to eliminate the >>> single point of failure represented by the TSO in those system >>> deployments requiring a higher degree of dependability. Last but not >>> least, the provided user API is very simple, mimicking transaction >>> managers in the relational world: begin, commit, rollback. >>> >>> >>> Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content >>> management platform powering some of next-generation search and >>> personalization products is using Omid as a transaction manager in its >>> processing pipeline. Sieve essentially acts as a huge processing hub >>> between content feeds and serving systems. It provides an environment >>> for highly customizable, real-time, streamed information processing, >>> with typical discovery-to-service latencies of just a few seconds. In >>> terms of scale and availability, Omid’s new design was largely driven >>> by Sieve’s requirements. >>> >>> >>> At Yahoo, we are also making an effort to disseminate the current >>> status of the project through blog entries (See [3], [4] and [5]) and >>> submissions to technical and academic conferences such as ATC 2016, >>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also >>> appeared in a TechCrunch article in the last quarter of 2015 (See [6]) >>> >>> >>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using >>> Distributed Transactions and Notifications. USENIX Symposium on >>> Operating Systems Design and Implementation, 2010 >>> >>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. >>> Omid: Lock-free transactional support for distributed data stores. In >>> Proc. of ICDE, 2013. >>> >>> [3] >>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for >>> >>> [4] >>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol >>> >>> [5] >>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid >>> >>> [6] >>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/ >>> >>> >>> === Rationale === >>> >>> Programming with ACID (Atomicity, Consistency, Isolation, Durability) >>> transactions is very popular and it is featured in relational >>> databases. However, in the Big Data ecosystem, applications typically >>> use NoSQL datastores, which do not provide ACID transactions. Such >>> NoSQL datastores used to give up transactional support for greater >>> agility and scalability. However, while early NoSQL data store >>> implementations did not include transaction support, the need for >>> transactions soon emerged in Big Data applications when accessing >>> shared data; for example, transactions are very important for >>> modern, scalable systems that process content incrementally. >>> >>> >>> NoSQL datastores -including HBase- don’t provide transactional >>> frameworks to coordinate the access to the underlying data for >>> preserving consistency. By using Omid, Big Data applications that need >>> to bundle multiple read and write operations on HBase into logically >>> indivisible units of work can execute transactions with ACID >>> properties, just as they would use transactions in the relational >>> database world. Omid extends the HBase key-value access APl with >>> transaction semantics. It can be exercised either directly, or via >>> higher level data management API’s. For example, Apache Phoenix >>> (SQL-on-top-of-HBase) might use Omid as its transaction management >>> component. >>> >>> >>> The following features make Omid an attractive choice for system >>> designers and other projects in the Apache community: >>> >>> >>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by >>> major SQL and NoSQL technologies (e.g. Google Percolator). >>> >>> >>> * Performance and Scalability. Omid provides a highly scalable, >>> lock-free implementation of SI. To the best of our knowledge, it is >>> also one of the few open source NoSQL transactional platforms that can >>> execute more than 100K transactions per second [1]. A new prototype >>> still in development can go even further, up to ~380K TPS. >>> >>> >>> * Reliability. Omid has a high-availability (HA) mode, in which the >>> core service performing writeset conflict resolution operates as >>> primary-backup process pair with automatic failover. The HA support >>> has zero overhead on the mainstream operation. >>> >>> >>> * Adaptability. Omid current version provides transactions on data >>> stored in Apache HBase. However, Omid’s components are generic enough >>> to be adapted to any other key-value NoSQL datasource that supports >>> MVCC. >>> >>> >>> * Development. Omid provides a very simple interface that mimics >>> standard HBase APIs, making it developer friendly. Only minimal >>> extensions to the standard interfaces have been introduced to enable >>> transactions. >>> >>> >>> * Simplicity. Omid leverages the HBase infrastructure for managing its >>> own metadata. It entails no additional services apart from those >>> provided and used by HBase. >>> >>> >>> * Track Record. As we have mentioned, Omid is already in use by >>> very-large-scale production systems at Yahoo. Also, Hortonworks is >>> integrating Omid in a metastore implementation for Hive based on >>> HBase. >>> >>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance >>> >>> >>> === Current Status === >>> Current Omid implementation is available in both, Yahoo’s internal >>> Github repository for internal use at Yahoo as well as in Yahoo’s >>> Github public repository (https://github.com/yahoo/omid.git). Both >>> repositories are managed by Omid’s current developers at Yahoo. >>> >>> As it is mentioned above, Yahoo is currently using Omid for providing >>> transactions in Sieve, a web-scale content management platform that >>> powers Yahoo’s next-generation search and personalization products. >>> >>> >>> ==== Meritocracy ==== >>> The first version of Omid was originally created in 2011 by Maysam >>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio >>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain. >>> >>> >>> During the years after its inception, Omid has matured to operate at >>> Web scale and has been used internally by strategic projects at Yahoo >>> such as Sieve. The current base of committers belong to the Yahoo team >>> that took over the initial Omid prototype and rewrote it to meet the >>> high availability and scalability requirements of the Sieve project. >>> This base of committers has recently incorporated Hortonworks members >>> that helped in the Omid adaptation to HBase 1.x versions. >>> >>> >>> With this initial committer base, we aim to form a larger community >>> that can collaborate with new ideas over the current code base. This >>> new community will run the project following the "Apache Way" >>> (http://apache.org/foundation/governance/). Users and new contributors >>> will be treated with respect and welcomed. To grow the community, we >>> will encourage contributors to provide patches, review code, propose >>> new features improvements, talk at conferences such as Hadoop Summit, >>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be >>> offered according to meritocracy. >>> >>> ==== Community ==== >>> >>> The public Yahoo Omid repository at Github currently has 241 Stars and >>> 93 forks, which means that there is an important interest for the >>> project in the open-source community, at least compared with other >>> similar projects (See https://github.com/yahoo/omid.git). >>> >>> >>> Recently, Hortonworks contributors to the Apache Hive project which >>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452) >>> manifested interest in using Omid. We started with them a fruitful >>> collaboration that resulted in Omid supporting HBase 1.x versions. >>> >>> >>> Salesforce is also interested in collaborating in doing a Proof of >>> Concept for integrating Omid as a pluggable transaction manager in >>> Apache Phoenix. >>> >>> >>> Yahoo, Hortonworks and Salesforce participants will constitute the >>> initial set of committers and mentors for the proposal. >>> >>> ==== Core Developers ==== >>> The core developers of Omid are all skilled software developers and >>> research engineers at Yahoo Inc. and Hortonworks with years of >>> experiences in their fields. At this moment, developers are >>> distributed across U.S. and Israel. The aim is to incorporate more >>> committers from different organizations and locations over time. >>> >>> >>> The current set of developers include experienced committers from >>> Apache HBase, Hive and Hadoop projects that have been working with us >>> in the current codebase found in Github. >>> >>> Finally, some of the core developers are currently NOT affiliated with >>> the ASF and would require new ICLAs to be filed. >>> >>> >>> === Alignment === >>> Omid enhances with transactions the already successful Apache HBase >>> datastore project. We have collaborated with other developers inside >>> and outside Yahoo which are involved in the Apache HBase community, so >>> we have had reliable feedback from them. >>> >>> Although Omid brings value into HBase, the design of the current >>> version provides a general transaction scheme that can potentially be >>> adapted to other MVCC key-value datastores such as Apache Cassandra. >>> >>> >>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on >>> top of HBase that can potentially integrate Omid in order to provide >>> the well-know concept of transactions to Phoenix-based applications. >>> >>> >>> === Known Risks === >>> ==== Orphaned products ==== >>> Yahoo’s Research and Search organizations have been taking care of >>> Omid development since the first prototype creation in 2011. Yahoo has >>> a long history participating in open-source projects, and has been >>> also a long time contributor to the Apache community. For example, in >>> Apache, Yahoo is an important contributor in many projects in the >>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also >>> open-sourced other well-known projects outside Hadoop, such as >>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make >>> Omid also a successful open-source Apache product. If this happens, we >>> are sure that a larger community will be formed around the project in >>> a relatively short period of time, contributing to the diversification >>> and stabilization of the base of committers. >>> >>> >>> ==== Inexperience with Open Source ==== >>> This project has long standing experienced mentors and interested >>> contributors from Apache HBase, Hive and Phoenix to help us moving >>> through the open source process. We are actively working with >>> experienced Apache community members to improve our project and >>> further testing. >>> >>> ==== Homogeneous Developers ==== >>> Omid has been supported by Yahoo since its inception in 2011. However, >>> all current committers are employed by their respective companies >>> shown in the Affiliations section. >>> >>> >>> ==== Reliance on Salaried Developers ==== >>> >>> All the current developers are paid by their employers to contribute >>> to this project. Yahoo developers will also continuing maintaining the >>> internal Omid repository at their company. >>> >>> Of course, other developers are welcomed to contribute to this project >>> after it is open sourced in Apache. >>> >>> ==== Relationships with Other Apache Product ==== >>> >>> Current Omid incarnation serves transactional contexts to applications >>> storing their data in HBase. However Omid design potentially allows to >>> be adapted to serve transactions on top of other MVCC-based key-value >>> datastores in Apache community such as Cassandra. >>> >>> >>> As a transactional framework, many other Apache projects such as >>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could >>> potentially benefit from Omid to get transactional contexts. In >>> particular, Apache Phoenix -a SQL layer on top of HBase- might use >>> Omid as its transaction management component. Once we open source Omid >>> as an Apache project, we expect to generate more interest in the >>> surrounded communities. >>> >>> >>> Very recently, a new incubator proposal for a similar project called >>> Tephra, has been submitted to the ASF. We think this is good for the >>> Apache community, and we believe that there’s room for both proposals >>> as the design of each of them is based on different principles (e.g. >>> Omid does not require to maintain the state of ongoing transactions on >>> the server-side component) and due to the fact that both -Tephra and >>> Omid- have also gained certain traction in the open-source community. >>> >>> >>> With regard to the Apache projects that Omid uses, apart from HBase, >>> Omid relies on Apache Zookeeper and Curator projects in order to >>> coordinate the (re)connection of transaction managers (acting as >>> clients) to the conflict resolution component for transactions (server >>> side.) They’re also used in order to coordinate the master and backup >>> replicas in high availability scenarios. >>> >>> >>> ==== An Excessive Fascination with the Apache Brand ==== >>> >>> We are applying to the Incubator process because we think that it is >>> the logical next step for the Omid project after we open-sourced the >>> code in Github some years ago. Yahoo has a long-standing history of >>> contributing to Apache projects. The developers and contributors >>> understand the implications of making it an Apache project, and >>> strongly believe that the growing community can benefit from the >>> Apache environment, ecosystem, and infrastrastructure. >>> >>> >>> === Documentation === >>> Current documentation about the project is available in the wiki of >>> Omid’s Github repository: https://github.com/yahoo/omid/wiki . It will >>> be moved under https://omid.incubator.apache.org/docs if the project >>> is accepted as an Apache Incubator. >>> >>> === Initial Source === >>> Initial source code is currently hosted in Github for general viewing >>> and contribution: >>> >>> https://github.com/yahoo/omid.git >>> >>> >>> Omid source code is written in Java code (99%) mixed with some shell >>> script (1%) in order to configure and trigger the execution of main >>> components. >>> >>> >>> The code will be moved to Apache http://git.apache.org/ if accepted as >>> an Incubator project. >>> >>> === Source and Intellectual Property Submission Plan === >>> >>> The current Omid License for the code published in Github is Apache >>> 2.0. If Omid fulfills and passes the conditions for being an Incubator >>> project in the ASF, the source code will be transitioned via the >>> Software Grant Agreement onto the ASF infrastructure and in turn made >>> available under the Apache License, version 2.0. >>> >>> === External Dependencies === >>> >>> >>> The required external dependencies that are not Apache projects are >>> all Apache licenses or other compatible Licenses: >>> >>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0] >>> >>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License] >>> >>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0] >>> >>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0] >>> >>> Testng v6.8.8 (http://testng.org) [Apache 2.0] >>> >>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License] >>> >>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0] >>> >>> Google Protocol Buffers v2.5.0 >>> (https://developers.google.com/protocol-buffers/) [BSD License] >>> >>> Mockito (http://mockito.org/) v1.9.5 [MIT License] >>> >>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) >>> [Apache 2.0] >>> >>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1 >>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0] >>> >>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0] >>> >>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License] >>> >>> >>> === Cryptography === >>> Omid project does not use cryptography itself. However, Apache HBase >>> -the datastore on top of which Omid works in its current version- uses >>> standard APIs and tools for SSH and SSL communication where necessary. >>> >>> === Required Resources === >>> We request that following resources be created for the project to use: >>> >>> ==== Mailing lists ==== >>> >>> omid-private (moderated subscriptions) >>> >>> omid-commits (commit notification) >>> omid-dev (technical discussions) >>> >>> ==== Git repository ==== >>> https://github.com/apache/incubator-omid >>> >>> ==== Documentation ==== >>> https://omid.incubator.apache.org/docs/ >>> >>> ==== JIRA instance ==== >>> https://issues.apache.org/jira/browse/omid >>> >>> === Initial Committers === >>> >>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com) >>> >>> >>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com) >>> >>> >>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org) >>> >>> >>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org) >>> >>> >>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com) >>> >>> >>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com) >>> >>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com) >>> >>> >>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com) >>> >>> >>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com) >>> >>> >>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com) >>> >>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>) >>> >>> >>> === Additional Interested Contributors === >>> * Ivan Kelly (ivank<AT>apache<DOT>org) >>> >>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com) >>> >>> >>> === Affiliations === >>> >>> * Edward Bortnikov, Yahoo Inc. >>> >>> >>> * Daniel Dai, Hortonworks >>> >>> >>> * Flavio P. Junqueira, Confluent >>> >>> >>> * Igor Katkov, Yahoo Inc. >>> >>> >>> * Ivan Kelly, Midokura >>> >>> >>> * Francis C. Liu, Yahoo Inc. >>> >>> >>> * Sameer Paranjpye, Arimo >>> >>> * Francisco Perez-Sorrosal, Yahoo Inc. >>> >>> >>> * Ohad Shacham, Yahoo Inc. >>> >>> >>> * Maysam Yabandeh, Dropbox Inc. >>> >>> >>> === Sponsors === >>> >>> ==== Champion ==== >>> >>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com) >>> >>> ==== Nominated Mentors ==== >>> >>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com) >>> >>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org) >>> >>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org) >>> >>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com) >>> >>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>) >>> >>> >>> ==== Sponsoring Entity ==== >>> Apache Incubator PMC >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org