Also, a Chinese localized operating system is pretty clearly different from an olap engine.
For comparison see the recent non-issue regarding Amazon aurora versus apache aurora. Sent from my iPhone > On Nov 14, 2014, at 9:55, Henry Saputra <henry.sapu...@gmail.com> wrote: > > Thanks for the reminder Ross. > Hopefully we could go in the similar route as Apache Spark, Apache > Storm, and Apache MetaModel where the trademark should be used as > 'Apache Kylin'. > > > - Henry > > On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH) > <ross.gard...@microsoft.com> wrote: >> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin >> >> Sent from my Windows Phone >> ________________________________ >> From: Luke Han<mailto:luke...@gmail.com> >> Sent: 11/14/2014 7:38 AM >> To: general@incubator.apache.org<mailto:general@incubator.apache.org> >> Subject: [PROPOSAL] Kylin for Incubation >> >> Hi all, >> We would like to propose Kylin as an Apache Incubator project. The >> complete proposal can be found: >> https://wiki.apache.org/incubator/KylinProposal and posted the text of >> the proposal below. >> >> Thanks. >> Luke >> >> >> Kylin Proposal >> ============== >> >> # Abstract >> >> Kylin is a distributed and scalable OLAP engine built on Hadoop to >> support extremely large datasets. >> >> # Proposal >> >> Kylin is an open source Distributed Analytics Engine that provides >> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to >> accelerate analytics on Hadoop by allowing the use of SQL-compatible >> tools. Kylin provides a SQL interface and multi-dimensional analysis >> (MOLAP) on Hadoop to support extremely large datasets and tightly >> integrate with Hadoop ecosystem. >> >> ## Overview of Kylin >> >> Kylin platform has two parts of data processing and interactive: >> First, Kylin will read data from source, Hive, and run a set of tasks >> including Map Reduce job, shell script to pre-calcuate results for a >> specified data model, then save the resulting OLAP cube into storage >> such as HBase. Once these OLAP cubes are ready, a user can submit a >> request from any SQL-based tool or third party applications to Kylin’s >> REST server. The Server calls the Query Engine to determine if the >> target dataset already exists. If so, the engine directly accesses the >> target data in the form of a predefined cube, and returns the result >> with sub-second latency. Otherwise, the engine is designed to route >> non-matching queries to whichever SQL on Hadoop tool is already >> available on a Hadoop cluster, such as Hive. >> >> Kylin platform includes: >> >> - Metadata Manager: Kylin is a metadata-driven application. The Kylin >> Metadata Manager is the key component that manages all metadata stored >> in Kylin including all cube metadata. All other components rely on the >> Metadata Manager. >> >> - Job Engine: This engine is designed to handle all of the offline >> jobs including shell script, Java API, and Map Reduce jobs. The Job >> Engine manages and coordinates all of the jobs in Kylin to make sure >> each job executes and handles failures. >> >> - Storage Engine: This engine manages the underlying storage – >> specifically, the cuboids, which are stored as key-value pairs. The >> Storage Engine uses HBase – the best solution from the Hadoop >> ecosystem for leveraging an existing K-V system. Kylin can also be >> extended to support other K-V systems, such as Redis. >> >> - Query Engine: Once the cube is ready, the Query Engine can receive >> and parse user queries. It then interacts with other components to >> return the results to the user. >> >> - REST Server: The REST Server is an entry point for applications to >> develop against Kylin. Applications can submit queries, get results, >> trigger cube build jobs, get metadata, get user privileges, and so on. >> >> - ODBC Driver: To support third-party tools and applications – such as >> Tableau – we have built and open-sourced an ODBC Driver. The goal is >> to make it easy for users to onboard. >> >> # Background >> >> The challenge we face at eBay is that our data volume is becoming >> bigger and bigger while our user base is becoming more diverse. For >> e.g. our business users and analysts consistently ask for minimal >> latency when visualizing data on Tableau and Excel. So, we worked >> closely with our internal analyst community and outlined the product >> requirements for Kylin: >> >> - Sub-second query latency on billions of rows >> - ANSI SQL availability for those using SQL-compatible tools >> - Full OLAP capability to offer advanced functionality >> - Support for high cardinality and very large dimensions >> - High concurrency for thousands of users >> - Distributed and scale-out architecture for analysis in the TB to PB size >> range >> >> Existing SQL-on-Hadoop solutions commonly need to perform partial or >> full table or file scans to compute the results of queries. The cost >> of these large data scans can make many queries very slow (more than a >> minute). The core idea of MOLAP (multi-dimensional OLAP) is to >> pre-compute data along dimensions of interest and store resulting >> aggregates as a "cube". MOLAP is much faster but is inflexible. We >> realized that no existing product met our exact requirements >> externally – especially in the open source Hadoop community. To meet >> our emerging business needs, we built a platform from scratch to >> support MOLAP for these business requirements and then to support more >> others include ROLAP. With an excellent development team and several >> pilot customers, we have been able to bring the Kylin platform into >> production as well as open source it. >> >> # Rationale >> >> When data grows to petabyte scale, the process of pre-calculation of a >> query takes a long time and costly and powerful hardware. However, >> with the benefit of Hadoop’s distributed computing architecture, jobs >> can leverage hundreds or thousands of Hadoop data nodes. There still >> exists a big gap between the growing volume of data and interactive >> analytics: >> >> - Existing Business Intelligence (OLAP) platforms cannot scale out to >> support fast growing data. >> - Existing SQL on Hadoop projects are not designed for OLAP use cases, >> huge tables joins will always take long time to scan and calculate. >> - No mature OLAP solution exists on Hadoop >> >> As mentioned in the background, the business requirements triggered by >> increase in data volume drove eBay to invest in building a solution >> from scratch to offer Analytics capability on Hadoop cluster. With >> Hadoop’s power of distributed computing Kylin can perform >> pre-calculations in parallel and merge the final results, thereby >> significantly reducing the processing time. >> >> To serve queries by the analyst community, Kylin generates cuboids >> with all possible combinations of dimensions, and calculate all >> metrics at different levels. The cuboids are then integrated to form a >> pre-calculated OLAP cube. All cuboids are key-value structured: keys >> are composites formed from combinations of multiple dimensions and >> values are aggregations results for that particular combination of >> dimensions. Kylin uses HBase to store cubes. HBase is useful because >> it supports efficient searches across ranges of data. >> >> # Current Status >> >> ## Meritocracy >> >> Kylin has been deployed in production at eBay and is processing >> extremely large datasets. The platform has demonstrated great >> performance benefits and has proved to be a better way for analysts to >> leverage data on Hadoop with a more convenient approach using their >> favorite tool. >> >> ## Community >> >> Kylin seeks to develop developer and user communities during incubation. >> >> ## Core Developers >> >> Kylin is currently being designed and developed by six engineers from >> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and >> Xiaodong Duo. In addition, some outside contributors are actively >> contributing in design and development. Among them, Julian Hyde from >> Hortonworks is a very important contributor. All of these core >> developers have deep expertise in Hadoop and the Hadoop Ecosystem in >> general. >> >> ## Alignment >> >> The ASF is a natural host for Kylin given that it is already the home >> of Hadoop, Pig, Hive, and other emerging cloud software projects. >> Kylin was designed to offer OLAP capability on Hadoop from the >> beginning in order to solve data access and analysis challenges in >> Hadoop clusters. Kylin complements the existing Hadoop analytics area >> by providing a comprehensive solution based on pre-computed views. >> >> In Kylin, we are leveraging an open-source dynamic data management >> framework called Apache Calcite to parse SQL and plug in our code. >> Apache Calcite was previously called Optiq, was originally authored by >> Julian Hyde and is now an Apache Incubator project. >> >> # Known Risks >> >> ## Orphaned Products >> >> The core developers of Kylin team plan to work full time on this >> project. There is very little risk of Kylin getting orphaned since at >> least one large company (eBay) is extensively using it in their >> production Hadoop clusters. For example, currently there are 3 use >> cases with more that 12+Billion rows and 1000 activity requests per >> day using Kylin in production. Furthermore, since Kylin was open >> sourced at the beginning of October 2014, it has received more than >> 280 stars and been forked nearly 100 times. Kylin has one major >> release so far and and received 5 pull requests from contributors in >> the first month pull requests from external sources in the last month, >> which further demonstrates Kylin as a very active project. We plan to >> extend and diversify this community further through Apache. >> >> ## Inexperience with Open Source >> >> The core developers are all active users and followers of open source. >> They are already committers and contributors to the Kylin Github >> project. All have been involved with the source code that has been >> released under an open source license, and several of them also have >> experience developing code in an open source environment. Though the >> core set of Developers do not have Apache Open Source experience, >> there are plans to onboard individuals with Apache open source >> experience on to the project. >> >> ## Homogenous Developers >> >> The core developers include developers from eBay, Ctrip and >> Hortonworks. Apache Incubation process encourages an open and diverse >> meritocratic community. Apache Kylin has the required amount of >> diversity with committers from three different organizations, but is >> also aware that bulk of the commits come from a single entity. Kylin >> intends to make every possible effort to build a diverse, vibrant and >> involved community and has already received substantial interest from >> various organizations >> >> ## Reliance on Salaried Developers >> >> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters >> and some of its key engineers are working full time on the project. In >> addition, since there is a growing Big Data need for scalable OLAP >> solutions on Hadoop, we look forward to other Apache developers and >> researchers to contribute to the project. Additional contributors, >> including Apache committers have plans to join this effort shortly. >> Also key to addressing the risk associated with relying on Salaried >> developers from a single entity is to increase the diversity of the >> contributors and actively lobby for Domain experts in the BI space to >> contribute. Apache Kylin intends to do this. One approach already >> taken is to approach the Apache Drill project to explore possible >> cooperation. >> >> ## Relationships with Other Apache Products >> >> Kylin has a strong relationship and dependency with Apache Hadoop >> HBase, Hive and Calcite. Being part of Apache’s Incubation community, >> could help with a closer collaboration among these four projects and >> as well as others. >> >> Kylin is likely to have substantial value to Apache Drill due to the >> common use of Calcite as a query optimization engine and similar >> approaches between Kylin's approach to cubing and Drill's approach to >> input sources. >> >> ## An Excessive Fascination with the Apache Brand >> >> Kylin is proposing to enter incubation at Apache in order to help >> efforts to diversify the committer-base, not so much to capitalize on >> the Apache brand. The Kylin project is in production use already >> inside EBay, but is not expected to be an EBay product for external >> customers. As such, the Kylin project is not seeking to use the Apache >> brand as a marketing tool. >> >> # Documentation >> >> Information about Kylin can be found at >> https://github.com/KylinOLAP/Kylin. The following links provide more >> information about Kylin in open source: >> >> - Kylin web site: http://kylin.io >> - Codebase at Github: https://github.com/KylinOLAP/Kylin >> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues >> - User community: https://groups.google.com/forum/#!forum/kylin-olap >> >> ## Initial Source >> >> Kylin has been under development since 2013 by a team of engineers at >> eBay Inc. It is currently hosted on Github.com under an Apache license >> at https://github.com/KylinOLAP/Kylin >> >> ## External Dependencies >> >> Kylin has the following external dependencies. >> >> * Basic >> >> - JDK 1.6+ >> - Apache Maven >> - JUnit >> - DBUnit >> - Log4j >> - Slf4j >> - Apache Commons >> - Google Guava >> - Jackson >> >> * Hadoop >> >> - Apache Hadoop >> - Apache HBase >> - Apache Hive >> - Apache Zookeeper >> - Apache Curator >> >> * Utility >> >> - H2 >> - JSCH >> >> * REST Service >> >> - Spring >> >> * Query >> >> - Antlr >> - Apache Calcite (formerly Optiq) >> - Linq4j >> >> * Job >> >> - Quartz >> >> * Web build tool >> >> - NPM >> - Grunt >> - bower >> >> * Web >> >> - Angular JS >> - jQuery >> - Bootstrap >> - D3 JS >> - ACE >> >> ##Cryptography >> >> Kylin will eventually support encryption on the wire. This is not one >> of the initial goals, and we do not expect Kylin to be a controlled >> export item due to the use of encryption. Kylin supports but does not >> require the Kerberos authentication mechanism to access secured Hadoop >> services. >> >> # Required Resources >> >> ## Mailing List >> >> - kylin-private for private PMC discussions (with moderated subscriptions) >> - kylin-dev >> - kylin-commits >> >> ##Subversion Directory >> >> Git is the preferred source control system: git://git.apache.org/Kylin >> >> ## Issue Tracking >> >> JIRA Kylin (KYLIN) >> >> ## Other Resources >> >> The existing code already has unit tests so we will make use of >> existing Apache continuous testing infrastructure. The resulting load >> should not be very large. >> >> # Initial Committers >> >> - Jiang Xu < jiangxu.china at gmail dot com> >> - Luke Han <lukhan at ebay dot com> >> - Yang Li <yangli9 at ebay dot com> >> - George Song <ysong1 at ebay dot com> >> - Hongbin Ma <honma at ebay dot com> >> - Xiaodong Duo < oranjedog at gmail dot com> >> - Julian Hyde < jhyde at apache dot org > >> - Ankur Bansal < abansal at ebay dot com> >> >> ## Affiliations >> >> The initial committers are employees of eBay Inc., Ctrip and >> Hortonworks. The nominated mentors are employees of Hortonworks, MapR >> Technologies and Pivotal. >> >> # Sponsors >> >> ## Champion >> >> - Owen O’Malley < omalley at apache dot org > >> - Ted Dunning <tdunning at apache dot org> >> >> ## Nominated Mentors >> >> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member, >> Co-founder and Senior Architect, Hortonworks >> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member, >> Chief Architect, MapR Technologies >> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal >> - Jacques Nadeau <jacques at apache dot org> (pending admission to >> IPMC) - Apache Drill PMC Chair, MapR Technologies >> >> #Sponsoring Entity >> >> We are requesting the Incubator to sponsor this project. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org