Thanks Henry very much. That is nice we got those mail lists so we can communicate with community well.
Thanks Edward Zhang On 11/3/15, 16:46, "Henry Saputra" <henry.sapu...@gmail.com> wrote: >Follow up announcement, the Apache Eagle incubating mailing lists are >now available: > >€ d...@eagle.incubator.apache.org (subscribe by sending email to >dev-subscr...@eagle.incubator.apache.org) >€ comm...@eagle.incubator.apache.org (subscribe by sending email to >commits-subscr...@eagle.incubator.apache.org) >€ u...@eagle.incubator.apache.org (subscribe by sending email to >user-subscr...@eagle.incubator.apache.org) > > >Thanks, > >Henry > > >On Fri, Oct 23, 2015 at 7:11 AM, Manoharan, Arun <armanoha...@ebay.com> >wrote: >> Hello Everyone, >> >> Thanks for all the feedback on the Eagle Proposal. >> >> I would like to call for a [VOTE] on Eagle joining the ASF as an >>incubation project. >> >> The vote is open for 72 hours: >> >> [ ] +1 accept Eagle in the Incubator >> [ ] ±0 >> [ ] -1 (please give reason) >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access >>to sensitive data, recognize attacks, malicious activities and take >>actions in real time. Eagle supports a wide variety of policies on HDFS >>data and Hive. Eagle also provides machine learning models for detecting >>anomalous user behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> <COPY of the proposal in text format> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>identify access to sensitive data, recognize attacks, malicious >>activities in hadoop and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>enforces policies defined on sensitive data access and alerts or blocks >>user¹s access to that sensitive data in real time. Eagle also creates >>user profiles based on the typical access behaviour for HDFS and Hive >>and sends alerts when anomalous behaviour is detected. Eagle can also >>import sensitive data information classified by external classification >>engines to help define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>storage. >> 2.Data processing and policy engine - Eagle allows users to create >>policies based on various metadata properties on HDFS, Hive and HBase >>data. >> 3.Eagle services - Eagle services include policy manager, query service >>and the visualization component. Eagle provides intuitive user interface >>to administer Eagle and an alert dashboard to respond to real time >>alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any >>data source into Eagle policy evaluation framework. For example, Eagle >>hdfs audit monitoring collects data from Kafka which is populated from >>namenode log4j appender or from logstash agent. Eagle hive monitoring >>collects hive query logs from running job through YARN API, which is >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage >>for storing metadata and metrics data, and also supports relational >>database through configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >>abstraction of Apache Storm. It can also be extended to other streaming >>engines. This abstraction allows developers to assemble data >>transformation, filtering, external data join etc. without physically >>bound to a specific streaming platform. Eagle streaming API allows >>developers to easily integrate business logic with Eagle policy engine >>and internally Eagle framework compiles business logic execution DAG >>into program primitives of underlying stream infrastructure e.g. Apache >>Storm. For example, Eagle HDFS monitoring transforms audit log from >>Namenode to object and joins sensitivity metadata, security zone >>metadata which are generated from external programs or configured by >>user. Eagle hive monitoring filters running jobs to get hive query >>string and parses query string into object and then joins sensitivity >>metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >>scalable policy engine framework, extensible policy engine framework. >>Stream metadata API allows developers to declare event schema including >>what attributes constitute an event, what is the type for each >>attribute, and how to dynamically resolve attribute value in runtime >>when user configures policy. Scalable policy engine framework allows >>policies to be executed on different physical nodes in parallel. It is >>also used to define your own policy partitioner class. Policy engine >>framework together with streaming partitioning capability provided by >>all streaming platforms will make sure policies and events can be >>evaluated in a fully distributed way. Extensible policy engine framework >>allows developer to plugin a new policy engine with a few lines of >>codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports >>as first-class citizen. >> Machine Learning module: Eagle provides capabilities to define user >>activity patterns or user profiles for Hadoop users based on the user >>behaviour in the platform. These user profiles are modeled using Machine >>Learning algorithms and used for detection of anomalous users >>activities. Eagle uses Eigen Value Decomposition, and Density Estimation >>algorithms for generating user profile models. The model reads data from >>HDFS audit logs, preprocesses and aggregates data, and generates models >>using Spark programming APIs. Once models are generated, Eagle uses >>stream processing engine for near real-time anomaly detection to >>determine if any user¹s activities are suspicious or not. >> >> Eagle Services: >> Query Service: Eagle provides SQL-like service API to support >>comprehensive computation for huge set of data on the fly, for e.g. >>comprehensive filtering, aggregation, histogram, sorting, top, >>arithmetical expression, pagination etc. HBase is the data storage which >>Eagle supports as first-class citizen, relational database is supported >>as well. For HBase storage, Eagle query framework compiles user provided >>SQL-like query into HBase native filter objects and execute it through >>HBase coprocessor on the fly. >> Policy Manager: Eagle policy manager provides UI and Restful API for >>user to define policy with just a few clicks. It includes site >>management UI, policy editor, sensitivity metadata import, HDFS or Hive >>sensitive resource browsing, alert dashboards etc. >> Background >> Data is one of the most important assets for today¹s businesses, which >>makes data security one of the top priorities of today¹s enterprises. >>Hadoop is widely used across different verticals as a big data >>repository to store this data in most modern enterprises. >> At eBay we use hadoop platform extensively for our data processing >>needs. Our data in Hadoop is becoming bigger and bigger as our user base >>is seeing an exponential growth. Today there are variety of data sets >>available in Hadoop cluster for our users to consume. eBay has around >>120 PB of data stored in HDFS across 6 different clusters and around >>1800+ active hadoop users consuming data thru Hive, HBase and mapreduce >>jobs everyday to build applications using this data. With this >>astronomical growth of data there are also challenges in securing >>sensitive data and monitoring the access to this sensitive data. Today >>in large organizations HDFS is the defacto standard for storing big >>data. Data sets which includes and not limited to consumer sentiment, >>social media data, customer segmentation, web clicks, sensor data, >>geo-location and transaction data get stored in Hadoop for day to day >>business needs. >> We at eBay want to make sure the sensitive data and data platforms are >>completely protected from security breaches. So we partnered very >>closely with our Information Security team to understand the >>requirements for Eagle to monitor sensitive data access on hadoop: >> 1.Ability to identify and stop security threats in real time >> 2.Scale for big data (Support PB scale and Billions of events) >> 3.Ability to create data access policies >> 4.Support multiple data sources like HDFS, HBase, Hive >> 5.Visualize alerts in real time >> 6.Ability to block malicious access in real time >> We did not find any data access monitoring solution that available >>today and can provide the features and functionality that we need to >>monitor the data access in the hadoop ecosystem at our scale. Hence with >>an excellent team of world class developers and several users, we have >>been able to bring Eagle into production as well as open source it. >> >> Rationale >> In today¹s world; data is an important asset for any company. >>Businesses are using data extensively to create amazing experiences for >>users. Data has to be protected and access to data should be secured >>from security breaches. Today Hadoop is not only used to store logs but >>also stores financial data, sensitive data sets, geographical data, user >>click stream data sets etc. which makes it more important to be >>protected from security breaches. To secure a data platform there are >>multiple things that need to happen. One is having a strong access >>control mechanism which today is provided by Apache Ranger and Apache >>Sentry. These tools provide the ability to provide fine grain access >>control mechanism to data sets on hadoop. But there is a big gap in >>terms of monitoring all the data access events and activities in order >>to securing the hadoop data platform. Together with strong access >>control, perimeter security and data access monitoring in place data in >>the hadoop clusters can be secured against breaches. We looked around >>and found following: >> Existing data activity monitoring products are designed for traditional >>databases and data warehouse. Existing monitoring platforms cannot scale >>out to support fast growing data and petabyte scale. Few products in the >>industry are still very early in terms of supporting HDFS, Hive, HBase >>data access monitoring. >> As mentioned in the background, the business requirement and urgency to >>secure the data from users with malicious intent drove eBay to invest in >>building a real time data access monitoring solution from scratch to >>offer real time alerts and remediation features for malicious data >>access. >> With the power of open source distributed systems like Hadoop, Kafka >>and much more we were able to develop a data activity monitoring system >>that can scale, identify and stop malicious access in real time. >> Eagle allows admins to create standard access policies and rules for >>monitoring HDFS, Hive and HBase data. Eagle also provides out of box >>machine learning models for modeling user profiles based on user access >>behaviour and use the model to alert on anomalies. >> >> Current Status >> >> Meritocracy >> Eagle has been deployed in production at eBay for monitoring billions >>of events per day from HDFS and Hive operations. From the start; the >>product has been built with focus on high scalability and application >>extensibility in mind and Eagle has demonstrated great performance in >>responding to suspicious events instantly and great flexibility in >>defining policy. >> >> Community >> Eagle seeks to develop the developer and user communities during >>incubation. >> >> Core Developers >> Eagle is currently being designed and developed by engineers from eBay >>Inc. Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang, >>Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of >>these core developers have deep expertise in developing monitoring >>products for the Hadoop ecosystem. >> >> Alignment >> The ASF is a natural host for Eagle given that it is already the home >>of Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data >>projects. Eagle leverages lot of Apache open-source products. Eagle was >>designed to offer real time insights into sensitive data access by >>actively monitoring the data access on various data sets in hadoop and >>an extensible alerting framework with a powerful policy engine. Eagle >>compliments the existing Hadoop platform area by providing a >>comprehensive monitoring and alerting solution for detecting sensitive >>data access threats based on preset policies and machine learning models >>for user behaviour analysis. >> >> Known Risks >> >> Orphaned Products >> The core developers of Eagle team work full time on this project. There >>is no risk of Eagle getting orphaned since eBay is extensively using it >>in their production Hadoop clusters and have plans to go beyond hadoop. >>For example, currently there are 7 hadoop clusters and 2 of them are >>being monitored using Hadoop Eagle in production. We have plans to >>extend it to all hadoop clusters and eventually other data platforms. >>There are 10¹s of policies onboarded and actively monitored with plans >>to onboard more use case. We are very confident that every hadoop >>cluster in the world will be monitored using Eagle for securing the >>hadoop ecosystem by actively monitoring for data access on sensitive >>data. We plan to extend and diversify this community further through >>Apache. We presented Eagle at the hadoop summit in china and garnered >>interest from different companies who use hadoop extensively. >> >> Inexperience with Open Source >> The core developers are all active users and followers of open source. >>They are already committers and contributors to the Eagle Github >>project. All have been involved with the source code that has been >>released under an open source license, and several of them also have >>experience developing code in an open source environment. Though the >>core set of Developers do not have Apache Open Source experience, there >>are plans to onboard individuals with Apache open source experience on >>to the project. Apache Kylin PMC members are also in the same ebay >>organization. We work very closely with Apache Ranger committers and are >>looking forward to find meaningful integrations to improve the security >>of hadoop platform. >> >> Homogenous Developers >> The core developers are from eBay. Today the problem of monitoring data >>activities to find and stop threats is a universal problem faced by all >>the businesses. Apache Incubation process encourages an open and diverse >>meritocratic community. Eagle intends to make every possible effort to >>build a diverse, vibrant and involved community and has already received >>substantial interest from various organizations. >> >> Reliance on Salaried Developers >> eBay invested in Eagle as the monitoring solution for Hadoop clusters >>and some of its key engineers are working full time on the project. In >>addition, since there is a growing need for securing sensitive data >>access we need a data activity monitoring solution for Hadoop, we look >>forward to other Apache developers and researchers to contribute to the >>project. Additional contributors, including Apache committers have plans >>to join this effort shortly. Also key to addressing the risk associated >>with relying on Salaried developers from a single entity is to increase >>the diversity of the contributors and actively lobby for Domain experts >>in the security space to contribute. Eagle intends to do this. >> >> Relationships with Other Apache Products >> Eagle has a strong relationship and dependency with Apache Hadoop, >>HBase, Spark, Kafka and Storm. Being part of Apache¹s Incubation >>community, could help with a closer collaboration among these projects >>and as well as others. An Excessive Fascination with the Apache Brand >>Eagle is proposing to enter incubation at Apache in order to help >>efforts to diversify the committer-base, not so much to capitalize on >>the Apache brand. The Eagle project is in production use already inside >>eBay, but is not expected to be an eBay product for external customers. >>As such, the Eagle project is not seeking to use the Apache brand as a >>marketing tool. >> >> Documentation >> Information about Eagle can be found at https://github.com/eBay/Eagle. >>The following link provide more information about Eagle >>http://goeagle.io<http://goeagle.io/>. >> >> Initial Source >> Eagle has been under development since 2014 by a team of engineers at >>eBay Inc. It is currently hosted on Github.com under an Apache license >>2.0 at https://github.com/eBay/Eagle. Once in incubation we will be >>moving the code base to apache git library. >> >> External Dependencies >> Eagle has the following external dependencies. >> Basic >> €JDK 1.7+ >> €Scala 2.10.4 >> €Apache Maven >> €JUnit >> €Log4j >> €Slf4j >> €Apache Commons >> €Apache Commons Math3 >> €Jackson >> €Siddhi CEP engine >> >> Hadoop >> €Apache Hadoop >> €Apache HBase >> €Apache Hive >> €Apache Zookeeper >> €Apache Curator >> >> Apache Spark >> €Spark Core Library >> >> REST Service >> €Jersey >> >> Query >> €Antlr >> >> Stream processing >> €Apache Storm >> €Apache Kafka >> >> Web >> €AngularJS >> €jQuery >> €Bootstrap V3 >> €Moment JS >> €Admin LTE >> €html5shiv >> €respond >> €Fastclick >> €Date Range Picker >> €Flot JS >> >> Cryptography >> Eagle will eventually support encryption on the wire. This is not one >>of the initial goals, and we do not expect Eagle to be a controlled >>export item due to the use of encryption. Eagle supports but does not >>require the Kerberos authentication mechanism to access secured Hadoop >>services. >> >> Required Resources >> >> Mailing List >> €eagle-private for private PMC discussions >> €eagle-dev for developers >> €eagle-commits for all commits >> €eagle-users for all eagle users >> >> Subversion Directory >> €Git is the preferred source control system. >> >> Issue Tracking >> €JIRA Eagle (Eagle) >> >> Other Resources >> The existing code already has unit tests so we will make use of >>existing Apache continuous testing infrastructure. The resulting load >>should not be very large. >> >> Initial Committers >> €Seshu Adunuthula <sadunuthula at ebay dot com> >> €Arun Manoharan <armanoharan at ebay dot com> >> €Edward Zhang <yonzhang at ebay dot com> >> €Hao Chen <hchen9 at ebay dot com> >> €Chaitali Gupta <cgupta at ebay dot com> >> €Libin Sun <libsun at ebay dot com> >> €Jilin Jiang <jiljiang at ebay dot com> >> €Qingwen Zhao <qingwzhao at ebay dot com> >> €Hemanth Dendukuri <hdendukuri at ebay dot com> >> €Senthil Kumar <senthilkumar at ebay dot com> >> >> >> Affiliations >> The initial committers are employees of eBay Inc. >> >> Sponsors >> >> Champion >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >> >> Nominated Mentors >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member, >>Hortonworks >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member, >>Hortonworks >> €Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache IPMC >>member >> €Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member, >>Hortonworks >> >> Sponsoring Entity >> We are requesting the Incubator to sponsor this project. >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org