Thanks Don. Looking forward to work with the Ranger team for meaningful integrations.
On 10/20/15, 10:19 PM, "Don Bosco Durai" <bo...@apache.org> wrote: >Hi Arun > >This looks really good and fills some obvious gaps in the security >landscape. > >Happy to contribute anyway you want. > >All the best!!! > >Bosco > > > > > >On 10/20/15, 8:02 AM, "Alex Karasulu" <akaras...@gmail.com on behalf of >akaras...@apache.org> wrote: > >>Hi Arun, >> >>Eagle sounds very promising. I just had a discussion with someone about >>this exact need. I do however agree with Greg on the name. As far as I >>can >>see, besides the name, your weakest point is the all eBay employed team. >>It's not a blocker and can be fixed during incubation. Good luck to you. >> >>Alex >> >> >>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun <armanoha...@ebay.com> >>wrote: >> >>> Hi Greg, >>> >>> Thank you for reviewing the proposal. >>> >>> Originally we thought Eagle might be trademarked by someone already >>>but I >>> went thru eBay legal team to get the clearance for the name to be >>>used. We >>> will look into it again to see if there will be potential problems. >>> >>> Thanks, >>> Arun >>> >>> On 10/20/15, 1:52 AM, "Greg Stein" <gst...@gmail.com> wrote: >>> >>> >Hey there, Arun! ... I have no commentary on the proposal itself, as >>>it >>> >looks like a great proposal. I would suggest being a bit wary of the >>>name, >>> >as "Eagle" is a *very* popular PCB design program. >>> > >>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun >>><armanoha...@ebay.com> >>> >wrote: >>> > >>> >> Hello Everyone, >>> >> >>> >> My name is Arun Manoharan. Currently a product manager in the >>>Analytics >>> >> platform team at eBay Inc. >>> >> >>> >> I would like to start a discussion on Eagle and its joining the ASF >>>as >>> >>an >>> >> incubation project. >>> >> >>> >> Eagle is a Monitoring solution for Hadoop to instantly identify >>>access >>> >>to >>> >> sensitive data, recognize attacks, malicious activities and take >>> >>actions in >>> >> real time. Eagle supports a wide variety of policies on HDFS data >>>and >>> >>Hive. >>> >> Eagle also provides machine learning models for detecting anomalous >>>user >>> >> behavior in Hadoop. >>> >> >>> >> The proposal is available on the wiki here: >>> >> https://wiki.apache.org/incubator/EagleProposal >>> >> >>> >> The text of the proposal is also available at the end of this email. >>> >> >>> >> Thanks for your time and help. >>> >> >>> >> Thanks, >>> >> Arun >>> >> >>> >> <COPY of the proposal in text format> >>> >> >>> >> Eagle >>> >> >>> >> Abstract >>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>> >> identify access to sensitive data, recognize attacks, malicious >>> >>activities >>> >> in hadoop and take actions. >>> >> >>> >> Proposal >>> >> Eagle audits access to HDFS files, Hive and HBase tables in real >>>time, >>> >> enforces policies defined on sensitive data access and alerts or >>>blocks >>> >> user¹s access to that sensitive data in real time. Eagle also >>>creates >>> >>user >>> >> profiles based on the typical access behaviour for HDFS and Hive and >>> >>sends >>> >> alerts when anomalous behaviour is detected. Eagle can also import >>> >> sensitive data information classified by external classification >>> >>engines to >>> >> help define its policies. >>> >> >>> >> Overview of Eagle >>> >> Eagle has 3 main parts. >>> >> 1.Data collection and storage - Eagle collects data from various >>>hadoop >>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>> >>storage. >>> >> 2.Data processing and policy engine - Eagle allows users to create >>> >> policies based on various metadata properties on HDFS, Hive and >>>HBase >>> >>data. >>> >> 3.Eagle services - Eagle services include policy manager, query >>>service >>> >> and the visualization component. Eagle provides intuitive user >>> >>interface to >>> >> administer Eagle and an alert dashboard to respond to real time >>>alerts. >>> >> >>> >> Data Collection and Storage: >>> >> Eagle provides programming API for extending Eagle to integrate any >>>data >>> >> source into Eagle policy evaluation framework. For example, Eagle >>>hdfs >>> >> audit monitoring collects data from Kafka which is populated from >>> >>namenode >>> >> log4j appender or from logstash agent. Eagle hive monitoring >>>collects >>> >>hive >>> >> query logs from running job through YARN API, which is designed to >>>be >>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing >>> >> metadata and metrics data, and also supports relational database >>>through >>> >> configuration change. >>> >> >>> >> Data Processing and Policy Engine: >>> >> Processing Engine: Eagle provides stream processing API which is an >>> >> abstraction of Apache Storm. It can also be extended to other >>>streaming >>> >> engines. This abstraction allows developers to assemble data >>> >> transformation, filtering, external data join etc. without >>>physically >>> >>bound >>> >> to a specific streaming platform. Eagle streaming API allows >>>developers >>> >>to >>> >> easily integrate business logic with Eagle policy engine and >>>internally >>> >> Eagle framework compiles business logic execution DAG into program >>> >> primitives of underlying stream infrastructure e.g. Apache Storm. >>>For >>> >> example, Eagle HDFS monitoring transforms audit log from Namenode to >>> >>object >>> >> and joins sensitivity metadata, security zone metadata which are >>> >>generated >>> >> from external programs or configured by user. Eagle hive monitoring >>> >>filters >>> >> running jobs to get hive query string and parses query string into >>> >>object >>> >> and then joins sensitivity metadata. >>> >> Alerting Framework: Eagle Alert Framework includes stream metadata >>>API, >>> >> scalable policy engine framework, extensible policy engine >>>framework. >>> >> Stream metadata API allows developers to declare event schema >>>including >>> >> what attributes constitute an event, what is the type for each >>> >>attribute, >>> >> and how to dynamically resolve attribute value in runtime when user >>> >> configures policy. Scalable policy engine framework allows policies >>>to >>> >>be >>> >> executed on different physical nodes in parallel. It is also used to >>> >>define >>> >> your own policy partitioner class. Policy engine framework together >>>with >>> >> streaming partitioning capability provided by all streaming >>>platforms >>> >>will >>> >> make sure policies and events can be evaluated in a fully >>>distributed >>> >>way. >>> >> Extensible policy engine framework allows developer to plugin a new >>> >>policy >>> >> engine with a few lines of codes. WSO2 Siddhi CEP engine is the >>>policy >>> >> engine which Eagle supports as first-class citizen. >>> >> Machine Learning module: Eagle provides capabilities to define user >>> >> activity patterns or user profiles for Hadoop users based on the >>>user >>> >> behaviour in the platform. These user profiles are modeled using >>>Machine >>> >> Learning algorithms and used for detection of anomalous users >>> >>activities. >>> >> Eagle uses Eigen Value Decomposition, and Density Estimation >>>algorithms >>> >>for >>> >> generating user profile models. The model reads data from HDFS audit >>> >>logs, >>> >> preprocesses and aggregates data, and generates models using Spark >>> >> programming APIs. Once models are generated, Eagle uses stream >>> >>processing >>> >> engine for near real-time anomaly detection to determine if any >>>user¹s >>> >> activities are suspicious or not. >>> >> >>> >> Eagle Services: >>> >> Query Service: Eagle provides SQL-like service API to support >>> >> comprehensive computation for huge set of data on the fly, for e.g. >>> >> comprehensive filtering, aggregation, histogram, sorting, top, >>> >>arithmetical >>> >> expression, pagination etc. HBase is the data storage which Eagle >>> >>supports >>> >> as first-class citizen, relational database is supported as well. >>>For >>> >>HBase >>> >> storage, Eagle query framework compiles user provided SQL-like query >>> >>into >>> >> HBase native filter objects and execute it through HBase >>>coprocessor on >>> >>the >>> >> fly. >>> >> Policy Manager: Eagle policy manager provides UI and Restful API for >>> >>user >>> >> to define policy with just a few clicks. It includes site >>>management UI, >>> >> policy editor, sensitivity metadata import, HDFS or Hive sensitive >>> >>resource >>> >> browsing, alert dashboards etc. >>> >> Background >>> >> Data is one of the most important assets for today¹s businesses, >>>which >>> >> makes data security one of the top priorities of today¹s >>>enterprises. >>> >> Hadoop is widely used across different verticals as a big data >>> >>repository >>> >> to store this data in most modern enterprises. >>> >> At eBay we use hadoop platform extensively for our data processing >>> >>needs. >>> >> Our data in Hadoop is becoming bigger and bigger as our user base is >>> >>seeing >>> >> an exponential growth. Today there are variety of data sets >>>available in >>> >> Hadoop cluster for our users to consume. eBay has around 120 PB of >>>data >>> >> stored in HDFS across 6 different clusters and around 1800+ active >>> >>hadoop >>> >> users consuming data thru Hive, HBase and mapreduce jobs everyday to >>> >>build >>> >> applications using this data. With this astronomical growth of data >>> >>there >>> >> are also challenges in securing sensitive data and monitoring the >>> >>access to >>> >> this sensitive data. Today in large organizations HDFS is the >>>defacto >>> >> standard for storing big data. Data sets which includes and not >>>limited >>> >>to >>> >> consumer sentiment, social media data, customer segmentation, web >>> >>clicks, >>> >> sensor data, geo-location and transaction data get stored in Hadoop >>>for >>> >>day >>> >> to day business needs. >>> >> We at eBay want to make sure the sensitive data and data platforms >>>are >>> >> completely protected from security breaches. So we partnered very >>> >>closely >>> >> with our Information Security team to understand the requirements >>>for >>> >>Eagle >>> >> to monitor sensitive data access on hadoop: >>> >> 1.Ability to identify and stop security threats in real time >>> >> 2.Scale for big data (Support PB scale and Billions of events) >>> >> 3.Ability to create data access policies >>> >> 4.Support multiple data sources like HDFS, HBase, Hive >>> >> 5.Visualize alerts in real time >>> >> 6.Ability to block malicious access in real time >>> >> We did not find any data access monitoring solution that available >>>today >>> >> and can provide the features and functionality that we need to >>>monitor >>> >>the >>> >> data access in the hadoop ecosystem at our scale. Hence with an >>> >>excellent >>> >> team of world class developers and several users, we have been able >>>to >>> >> bring Eagle into production as well as open source it. >>> >> >>> >> Rationale >>> >> In today¹s world; data is an important asset for any company. >>>Businesses >>> >> are using data extensively to create amazing experiences for users. >>>Data >>> >> has to be protected and access to data should be secured from >>>security >>> >> breaches. Today Hadoop is not only used to store logs but also >>>stores >>> >> financial data, sensitive data sets, geographical data, user click >>> >>stream >>> >> data sets etc. which makes it more important to be protected from >>> >>security >>> >> breaches. To secure a data platform there are multiple things that >>>need >>> >>to >>> >> happen. One is having a strong access control mechanism which today >>>is >>> >> provided by Apache Ranger and Apache Sentry. These tools provide the >>> >> ability to provide fine grain access control mechanism to data sets >>>on >>> >> hadoop. But there is a big gap in terms of monitoring all the data >>> >>access >>> >> events and activities in order to securing the hadoop data platform. >>> >> Together with strong access control, perimeter security and data >>>access >>> >> monitoring in place data in the hadoop clusters can be secured >>>against >>> >> breaches. We looked around and found following: >>> >> Existing data activity monitoring products are designed for >>>traditional >>> >> databases and data warehouse. Existing monitoring platforms cannot >>>scale >>> >> out to support fast growing data and petabyte scale. Few products >>>in the >>> >> industry are still very early in terms of supporting HDFS, Hive, >>>HBase >>> >>data >>> >> access monitoring. >>> >> As mentioned in the background, the business requirement and >>>urgency to >>> >> secure the data from users with malicious intent drove eBay to >>>invest in >>> >> building a real time data access monitoring solution from scratch to >>> >>offer >>> >> real time alerts and remediation features for malicious data access. >>> >> With the power of open source distributed systems like Hadoop, >>>Kafka and >>> >> much more we were able to develop a data activity monitoring system >>>that >>> >> can scale, identify and stop malicious access in real time. >>> >> Eagle allows admins to create standard access policies and rules for >>> >> monitoring HDFS, Hive and HBase data. Eagle also provides out of box >>> >> machine learning models for modeling user profiles based on user >>>access >>> >> behaviour and use the model to alert on anomalies. >>> >> >>> >> Current Status >>> >> >>> >> Meritocracy >>> >> Eagle has been deployed in production at eBay for monitoring >>>billions of >>> >> events per day from HDFS and Hive operations. From the start; the >>> >>product >>> >> has been built with focus on high scalability and application >>> >>extensibility >>> >> in mind and Eagle has demonstrated great performance in responding >>>to >>> >> suspicious events instantly and great flexibility in defining >>>policy. >>> >> >>> >> Community >>> >> Eagle seeks to develop the developer and user communities during >>> >> incubation. >>> >> >>> >> Core Developers >>> >> Eagle is currently being designed and developed by engineers from >>>eBay >>> >> Inc. Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin >>>Jiang, >>> >> Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All >>>of >>> >> these core developers have deep expertise in developing monitoring >>> >>products >>> >> for the Hadoop ecosystem. >>> >> >>> >> Alignment >>> >> The ASF is a natural host for Eagle given that it is already the >>>home of >>> >> Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data >>> >> projects. Eagle leverages lot of Apache open-source products. Eagle >>>was >>> >> designed to offer real time insights into sensitive data access by >>> >>actively >>> >> monitoring the data access on various data sets in hadoop and an >>> >>extensible >>> >> alerting framework with a powerful policy engine. Eagle compliments >>>the >>> >> existing Hadoop platform area by providing a comprehensive >>>monitoring >>> >>and >>> >> alerting solution for detecting sensitive data access threats based >>>on >>> >> preset policies and machine learning models for user behaviour >>>analysis. >>> >> >>> >> Known Risks >>> >> >>> >> Orphaned Products >>> >> The core developers of Eagle team work full time on this project. >>>There >>> >>is >>> >> no risk of Eagle getting orphaned since eBay is extensively using >>>it in >>> >> their production Hadoop clusters and have plans to go beyond >>>hadoop. For >>> >> example, currently there are 7 hadoop clusters and 2 of them are >>>being >>> >> monitored using Hadoop Eagle in production. We have plans to extend >>>it >>> >>to >>> >> all hadoop clusters and eventually other data platforms. There are >>>10¹s >>> >>of >>> >> policies onboarded and actively monitored with plans to onboard >>>more use >>> >> case. We are very confident that every hadoop cluster in the world >>>will >>> >>be >>> >> monitored using Eagle for securing the hadoop ecosystem by actively >>> >> monitoring for data access on sensitive data. We plan to extend and >>> >> diversify this community further through Apache. We presented Eagle >>>at >>> >>the >>> >> hadoop summit in china and garnered interest from different >>>companies >>> >>who >>> >> use hadoop extensively. >>> >> >>> >> Inexperience with Open Source >>> >> The core developers are all active users and followers of open >>>source. >>> >> They are already committers and contributors to the Eagle Github >>> >>project. >>> >> All have been involved with the source code that has been released >>> >>under an >>> >> open source license, and several of them also have experience >>>developing >>> >> code in an open source environment. Though the core set of >>>Developers do >>> >> not have Apache Open Source experience, there are plans to onboard >>> >> individuals with Apache open source experience on to the project. >>>Apache >>> >> Kylin PMC members are also in the same ebay organization. We work >>>very >>> >> closely with Apache Ranger committers and are looking forward to >>>find >>> >> meaningful integrations to improve the security of hadoop platform. >>> >> >>> >> Homogenous Developers >>> >> The core developers are from eBay. Today the problem of monitoring >>>data >>> >> activities to find and stop threats is a universal problem faced by >>>all >>> >>the >>> >> businesses. Apache Incubation process encourages an open and diverse >>> >> meritocratic community. Eagle intends to make every possible effort >>>to >>> >> build a diverse, vibrant and involved community and has already >>>received >>> >> substantial interest from various organizations. >>> >> >>> >> Reliance on Salaried Developers >>> >> eBay invested in Eagle as the monitoring solution for Hadoop >>>clusters >>> >>and >>> >> some of its key engineers are working full time on the project. In >>> >> addition, since there is a growing need for securing sensitive data >>> >>access >>> >> we need a data activity monitoring solution for Hadoop, we look >>>forward >>> >>to >>> >> other Apache developers and researchers to contribute to the >>>project. >>> >> Additional contributors, including Apache committers have plans to >>>join >>> >> this effort shortly. Also key to addressing the risk associated with >>> >> relying on Salaried developers from a single entity is to increase >>>the >>> >> diversity of the contributors and actively lobby for Domain experts >>>in >>> >>the >>> >> security space to contribute. Eagle intends to do this. >>> >> >>> >> Relationships with Other Apache Products >>> >> Eagle has a strong relationship and dependency with Apache Hadoop, >>> >>HBase, >>> >> Spark, Kafka and Storm. Being part of Apache¹s Incubation community, >>> >>could >>> >> help with a closer collaboration among these projects and as well as >>> >> others. An Excessive Fascination with the Apache Brand Eagle is >>> >>proposing >>> >> to enter incubation at Apache in order to help efforts to diversify >>>the >>> >> committer-base, not so much to capitalize on the Apache brand. The >>>Eagle >>> >> project is in production use already inside eBay, but is not >>>expected >>> >>to be >>> >> an eBay product for external customers. As such, the Eagle project >>>is >>> >>not >>> >> seeking to use the Apache brand as a marketing tool. >>> >> >>> >> Documentation >>> >> Information about Eagle can be found at >>>https://github.com/eBay/Eagle. >>> >> The following link provide more information about Eagle >>> >>http://goeagle.io. >>> >> >>> >> Initial Source >>> >> Eagle has been under development since 2014 by a team of engineers >>>at >>> >>eBay >>> >> Inc. It is currently hosted on Github.com under an Apache license >>>2.0 at >>> >> https://github.com/eBay/Eagle. Once in incubation we will be moving >>>the >>> >> code base to apache git library. >>> >> >>> >> External Dependencies >>> >> Eagle has the following external dependencies. >>> >> Basic >>> >> €JDK 1.7+ >>> >> €Scala 2.10.4 >>> >> €Apache Maven >>> >> €JUnit >>> >> €Log4j >>> >> €Slf4j >>> >> €Apache Commons >>> >> €Apache Commons Math3 >>> >> €Jackson >>> >> €Siddhi CEP engine >>> >> >>> >> Hadoop >>> >> €Apache Hadoop >>> >> €Apache HBase >>> >> €Apache Hive >>> >> €Apache Zookeeper >>> >> €Apache Curator >>> >> >>> >> Apache Spark >>> >> €Spark Core Library >>> >> >>> >> REST Service >>> >> €Jersey >>> >> >>> >> Query >>> >> €Antlr >>> >> >>> >> Stream processing >>> >> €Apache Storm >>> >> €Apache Kafka >>> >> >>> >> Web >>> >> €AngularJS >>> >> €jQuery >>> >> €Bootstrap V3 >>> >> €Moment JS >>> >> €Admin LTE >>> >> €html5shiv >>> >> €respond >>> >> €Fastclick >>> >> €Date Range Picker >>> >> €Flot JS >>> >> >>> >> Cryptography >>> >> Eagle will eventually support encryption on the wire. This is not >>>one of >>> >> the initial goals, and we do not expect Eagle to be a controlled >>>export >>> >> item due to the use of encryption. Eagle supports but does not >>>require >>> >>the >>> >> Kerberos authentication mechanism to access secured Hadoop services. >>> >> >>> >> Required Resources >>> >> >>> >> Mailing List >>> >> €eagle-private for private PMC discussions >>> >> €eagle-dev for developers >>> >> €eagle-commits for all commits >>> >> €eagle-users for all eagle users >>> >> >>> >> Subversion Directory >>> >> €Git is the preferred source control system. >>> >> >>> >> Issue Tracking >>> >> €JIRA Eagle (Eagle) >>> >> >>> >> Other Resources >>> >> The existing code already has unit tests so we will make use of >>>existing >>> >> Apache continuous testing infrastructure. The resulting load should >>>not >>> >>be >>> >> very large. >>> >> >>> >> Initial Committers >>> >> €Seshu Adunuthula <sadunuthula at ebay dot com> >>> >> €Arun Manoharan <armanoharan at ebay dot com> >>> >> €Edward Zhang <yonzhang at ebay dot com> >>> >> €Hao Chen <hchen9 at ebay dot com> >>> >> €Chaitali Gupta <cgupta at ebay dot com> >>> >> €Libin Sun <libsun at ebay dot com> >>> >> €Jilin Jiang <jiljiang at ebay dot com> >>> >> €Qingwen Zhao <qingwzhao at ebay dot com> >>> >> €Hemanth Dendukuri <hdendukuri at ebay dot com> >>> >> €Senthil Kumar <senthilkumar at ebay dot com> >>> >> €Tan Chen <tanchen at ebay dot com> >>> >> >>> >> Affiliations >>> >> The initial committers are employees of eBay Inc. >>> >> >>> >> Sponsors >>> >> >>> >> Champion >>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >>> >> >>> >> Nominated Mentors >>> >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member, >>> >> Hortonworks >>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member >>> >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member, >>> >> Hortonworks >>> >> >>> >> Sponsoring Entity >>> >> We are requesting the Incubator to sponsor this project. >>> >> >>> >> >>> >> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >>> >> >> >>-- >>Best Regards, >>-- Alex > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org