Thanks Don. Looking forward to work with the Ranger team for meaningful
integrations.

On 10/20/15, 10:19 PM, "Don Bosco Durai" <bo...@apache.org> wrote:

>Hi Arun
>
>This looks really good and fills some obvious gaps in the security
>landscape.
>
>Happy to contribute anyway you want.
>
>All the best!!!
>
>Bosco
>
>
>
>
>
>On 10/20/15, 8:02 AM, "Alex Karasulu" <akaras...@gmail.com on behalf of
>akaras...@apache.org> wrote:
>
>>Hi Arun,
>>
>>Eagle sounds very promising. I just had a discussion with someone about
>>this exact need. I do however agree with Greg on the name. As far as I
>>can
>>see, besides the name, your weakest point is the all eBay employed team.
>>It's not a blocker and can be fixed during incubation. Good luck to you.
>>
>>Alex
>>
>>
>>On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun <armanoha...@ebay.com>
>>wrote:
>>
>>> Hi Greg,
>>>
>>> Thank you for reviewing the proposal.
>>>
>>> Originally we thought Eagle might be trademarked by someone already
>>>but I
>>> went thru eBay legal team to get the clearance for the name to be
>>>used. We
>>> will look into it again to see if there will be potential problems.
>>>
>>> Thanks,
>>> Arun
>>>
>>> On 10/20/15, 1:52 AM, "Greg Stein" <gst...@gmail.com> wrote:
>>>
>>> >Hey there, Arun! ... I have no commentary on the proposal itself, as
>>>it
>>> >looks like a great proposal. I would suggest being a bit wary of the
>>>name,
>>> >as "Eagle" is a *very* popular PCB design program.
>>> >
>>> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun
>>><armanoha...@ebay.com>
>>> >wrote:
>>> >
>>> >> Hello Everyone,
>>> >>
>>> >> My name is Arun Manoharan. Currently a product manager in the
>>>Analytics
>>> >> platform team at eBay Inc.
>>> >>
>>> >> I would like to start a discussion on Eagle and its joining the ASF
>>>as
>>> >>an
>>> >> incubation project.
>>> >>
>>> >> Eagle is a Monitoring solution for Hadoop to instantly identify
>>>access
>>> >>to
>>> >> sensitive data, recognize attacks, malicious activities and take
>>> >>actions in
>>> >> real time. Eagle supports a wide variety of policies on HDFS data
>>>and
>>> >>Hive.
>>> >> Eagle also provides machine learning models for detecting anomalous
>>>user
>>> >> behavior in Hadoop.
>>> >>
>>> >> The proposal is available on the wiki here:
>>> >> https://wiki.apache.org/incubator/EagleProposal
>>> >>
>>> >> The text of the proposal is also available at the end of this email.
>>> >>
>>> >> Thanks for your time and help.
>>> >>
>>> >> Thanks,
>>> >> Arun
>>> >>
>>> >> <COPY of the proposal in text format>
>>> >>
>>> >> Eagle
>>> >>
>>> >> Abstract
>>> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>> >> identify access to sensitive data, recognize attacks, malicious
>>> >>activities
>>> >> in hadoop and take actions.
>>> >>
>>> >> Proposal
>>> >> Eagle audits access to HDFS files, Hive and HBase tables in real
>>>time,
>>> >> enforces policies defined on sensitive data access and alerts or
>>>blocks
>>> >> user¹s access to that sensitive data in real time. Eagle also
>>>creates
>>> >>user
>>> >> profiles based on the typical access behaviour for HDFS and Hive and
>>> >>sends
>>> >> alerts when anomalous behaviour is detected. Eagle can also import
>>> >> sensitive data information classified by external classification
>>> >>engines to
>>> >> help define its policies.
>>> >>
>>> >> Overview of Eagle
>>> >> Eagle has 3 main parts.
>>> >> 1.Data collection and storage - Eagle collects data from various
>>>hadoop
>>> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>> >>storage.
>>> >> 2.Data processing and policy engine - Eagle allows users to create
>>> >> policies based on various metadata properties on HDFS, Hive and
>>>HBase
>>> >>data.
>>> >> 3.Eagle services - Eagle services include policy manager, query
>>>service
>>> >> and the visualization component. Eagle provides intuitive user
>>> >>interface to
>>> >> administer Eagle and an alert dashboard to respond to real time
>>>alerts.
>>> >>
>>> >> Data Collection and Storage:
>>> >> Eagle provides programming API for extending Eagle to integrate any
>>>data
>>> >> source into Eagle policy evaluation framework. For example, Eagle
>>>hdfs
>>> >> audit monitoring collects data from Kafka which is populated from
>>> >>namenode
>>> >> log4j appender or from logstash agent. Eagle hive monitoring
>>>collects
>>> >>hive
>>> >> query logs from running job through YARN API, which is designed to
>>>be
>>> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
>>> >> metadata and metrics data, and also supports relational database
>>>through
>>> >> configuration change.
>>> >>
>>> >> Data Processing and Policy Engine:
>>> >> Processing Engine: Eagle provides stream processing API which is an
>>> >> abstraction of Apache Storm. It can also be extended to other
>>>streaming
>>> >> engines. This abstraction allows developers to assemble data
>>> >> transformation, filtering, external data join etc. without
>>>physically
>>> >>bound
>>> >> to a specific streaming platform. Eagle streaming API allows
>>>developers
>>> >>to
>>> >> easily integrate business logic with Eagle policy engine and
>>>internally
>>> >> Eagle framework compiles business logic execution DAG into program
>>> >> primitives of underlying stream infrastructure e.g. Apache Storm.
>>>For
>>> >> example, Eagle HDFS monitoring transforms audit log from Namenode to
>>> >>object
>>> >> and joins sensitivity metadata, security zone metadata which are
>>> >>generated
>>> >> from external programs or configured by user. Eagle hive monitoring
>>> >>filters
>>> >> running jobs to get hive query string and parses query string into
>>> >>object
>>> >> and then joins sensitivity metadata.
>>> >> Alerting Framework: Eagle Alert Framework includes stream metadata
>>>API,
>>> >> scalable policy engine framework, extensible policy engine
>>>framework.
>>> >> Stream metadata API allows developers to declare event schema
>>>including
>>> >> what attributes constitute an event, what is the type for each
>>> >>attribute,
>>> >> and how to dynamically resolve attribute value in runtime when user
>>> >> configures policy. Scalable policy engine framework allows policies
>>>to
>>> >>be
>>> >> executed on different physical nodes in parallel. It is also used to
>>> >>define
>>> >> your own policy partitioner class. Policy engine framework together
>>>with
>>> >> streaming partitioning capability provided by all streaming
>>>platforms
>>> >>will
>>> >> make sure policies and events can be evaluated in a fully
>>>distributed
>>> >>way.
>>> >> Extensible policy engine framework allows developer to plugin a new
>>> >>policy
>>> >> engine with a few lines of codes. WSO2 Siddhi CEP engine is the
>>>policy
>>> >> engine which Eagle supports as first-class citizen.
>>> >> Machine Learning module: Eagle provides capabilities to define user
>>> >> activity patterns or user profiles for Hadoop users based on the
>>>user
>>> >> behaviour in the platform. These user profiles are modeled using
>>>Machine
>>> >> Learning algorithms and used for detection of anomalous users
>>> >>activities.
>>> >> Eagle uses Eigen Value Decomposition, and Density Estimation
>>>algorithms
>>> >>for
>>> >> generating user profile models. The model reads data from HDFS audit
>>> >>logs,
>>> >> preprocesses and aggregates data, and generates models using Spark
>>> >> programming APIs. Once models are generated, Eagle uses stream
>>> >>processing
>>> >> engine for near real-time anomaly detection to determine if any
>>>user¹s
>>> >> activities are suspicious or not.
>>> >>
>>> >> Eagle Services:
>>> >> Query Service: Eagle provides SQL-like service API to support
>>> >> comprehensive computation for huge set of data on the fly, for e.g.
>>> >> comprehensive filtering, aggregation, histogram, sorting, top,
>>> >>arithmetical
>>> >> expression, pagination etc. HBase is the data storage which Eagle
>>> >>supports
>>> >> as first-class citizen, relational database is supported as well.
>>>For
>>> >>HBase
>>> >> storage, Eagle query framework compiles user provided SQL-like query
>>> >>into
>>> >> HBase native filter objects and execute it through HBase
>>>coprocessor on
>>> >>the
>>> >> fly.
>>> >> Policy Manager: Eagle policy manager provides UI and Restful API for
>>> >>user
>>> >> to define policy with just a few clicks. It includes site
>>>management UI,
>>> >> policy editor, sensitivity metadata import, HDFS or Hive sensitive
>>> >>resource
>>> >> browsing, alert dashboards etc.
>>> >> Background
>>> >> Data is one of the most important assets for today¹s businesses,
>>>which
>>> >> makes data security one of the top priorities of today¹s
>>>enterprises.
>>> >> Hadoop is widely used across different verticals as a big data
>>> >>repository
>>> >> to store this data in most modern enterprises.
>>> >> At eBay we use hadoop platform extensively for our data processing
>>> >>needs.
>>> >> Our data in Hadoop is becoming bigger and bigger as our user base is
>>> >>seeing
>>> >> an exponential growth. Today there are variety of data sets
>>>available in
>>> >> Hadoop cluster for our users to consume. eBay has around 120 PB of
>>>data
>>> >> stored in HDFS across 6 different clusters and around 1800+ active
>>> >>hadoop
>>> >> users consuming data thru Hive, HBase and mapreduce jobs everyday to
>>> >>build
>>> >> applications using this data. With this astronomical growth of data
>>> >>there
>>> >> are also challenges in securing sensitive data and monitoring the
>>> >>access to
>>> >> this sensitive data. Today in large organizations HDFS is the
>>>defacto
>>> >> standard for storing big data. Data sets which includes and not
>>>limited
>>> >>to
>>> >> consumer sentiment, social media data, customer segmentation, web
>>> >>clicks,
>>> >> sensor data, geo-location and transaction data get stored in Hadoop
>>>for
>>> >>day
>>> >> to day business needs.
>>> >> We at eBay want to make sure the sensitive data and data platforms
>>>are
>>> >> completely protected from security breaches. So we partnered very
>>> >>closely
>>> >> with our Information Security team to understand the requirements
>>>for
>>> >>Eagle
>>> >> to monitor sensitive data access on hadoop:
>>> >> 1.Ability to identify and stop security threats in real time
>>> >> 2.Scale for big data (Support PB scale and Billions of events)
>>> >> 3.Ability to create data access policies
>>> >> 4.Support multiple data sources like HDFS, HBase, Hive
>>> >> 5.Visualize alerts in real time
>>> >> 6.Ability to block malicious access in real time
>>> >> We did not find any data access monitoring solution that available
>>>today
>>> >> and can provide the features and functionality that we need to
>>>monitor
>>> >>the
>>> >> data access in the hadoop ecosystem at our scale. Hence with an
>>> >>excellent
>>> >> team of world class developers and several users, we have been able
>>>to
>>> >> bring Eagle into production as well as open source it.
>>> >>
>>> >> Rationale
>>> >> In today¹s world; data is an important asset for any company.
>>>Businesses
>>> >> are using data extensively to create amazing experiences for users.
>>>Data
>>> >> has to be protected and access to data should be secured from
>>>security
>>> >> breaches. Today Hadoop is not only used to store logs but also
>>>stores
>>> >> financial data, sensitive data sets, geographical data, user click
>>> >>stream
>>> >> data sets etc. which makes it more important to be protected from
>>> >>security
>>> >> breaches. To secure a data platform there are multiple things that
>>>need
>>> >>to
>>> >> happen. One is having a strong access control mechanism which today
>>>is
>>> >> provided by Apache Ranger and Apache Sentry. These tools provide the
>>> >> ability to provide fine grain access control mechanism to data sets
>>>on
>>> >> hadoop. But there is a big gap in terms of monitoring all the data
>>> >>access
>>> >> events and activities in order to securing the hadoop data platform.
>>> >> Together with strong access control, perimeter security and data
>>>access
>>> >> monitoring in place data in the hadoop clusters can be secured
>>>against
>>> >> breaches. We looked around and found following:
>>> >> Existing data activity monitoring products are designed for
>>>traditional
>>> >> databases and data warehouse. Existing monitoring platforms cannot
>>>scale
>>> >> out to support fast growing data and petabyte scale. Few products
>>>in the
>>> >> industry are still very early in terms of supporting HDFS, Hive,
>>>HBase
>>> >>data
>>> >> access monitoring.
>>> >> As mentioned in the background, the business requirement and
>>>urgency to
>>> >> secure the data from users with malicious intent drove eBay to
>>>invest in
>>> >> building a real time data access monitoring solution from scratch to
>>> >>offer
>>> >> real time alerts and remediation features for malicious data access.
>>> >> With the power of open source distributed systems like Hadoop,
>>>Kafka and
>>> >> much more we were able to develop a data activity monitoring system
>>>that
>>> >> can scale, identify and stop malicious access in real time.
>>> >> Eagle allows admins to create standard access policies and rules for
>>> >> monitoring HDFS, Hive and HBase data. Eagle also provides out of box
>>> >> machine learning models for modeling user profiles based on user
>>>access
>>> >> behaviour and use the model to alert on anomalies.
>>> >>
>>> >> Current Status
>>> >>
>>> >> Meritocracy
>>> >> Eagle has been deployed in production at eBay for monitoring
>>>billions of
>>> >> events per day from HDFS and Hive operations. From the start; the
>>> >>product
>>> >> has been built with focus on high scalability and application
>>> >>extensibility
>>> >> in mind and Eagle has demonstrated great performance in responding
>>>to
>>> >> suspicious events instantly and great flexibility in defining
>>>policy.
>>> >>
>>> >> Community
>>> >> Eagle seeks to develop the developer and user communities during
>>> >> incubation.
>>> >>
>>> >> Core Developers
>>> >> Eagle is currently being designed and developed by engineers from
>>>eBay
>>> >> Inc. ­ Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin
>>>Jiang,
>>> >> Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All
>>>of
>>> >> these core developers have deep expertise in developing monitoring
>>> >>products
>>> >> for the Hadoop ecosystem.
>>> >>
>>> >> Alignment
>>> >> The ASF is a natural host for Eagle given that it is already the
>>>home of
>>> >> Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
>>> >> projects. Eagle leverages lot of Apache open-source products. Eagle
>>>was
>>> >> designed to offer real time insights into sensitive data access by
>>> >>actively
>>> >> monitoring the data access on various data sets in hadoop and an
>>> >>extensible
>>> >> alerting framework with a powerful policy engine. Eagle compliments
>>>the
>>> >> existing Hadoop platform area by providing a comprehensive
>>>monitoring
>>> >>and
>>> >> alerting solution for detecting sensitive data access threats based
>>>on
>>> >> preset policies and machine learning models for user behaviour
>>>analysis.
>>> >>
>>> >> Known Risks
>>> >>
>>> >> Orphaned Products
>>> >> The core developers of Eagle team work full time on this project.
>>>There
>>> >>is
>>> >> no risk of Eagle getting orphaned since eBay is extensively using
>>>it in
>>> >> their production Hadoop clusters and have plans to go beyond
>>>hadoop. For
>>> >> example, currently there are 7 hadoop clusters and 2 of them are
>>>being
>>> >> monitored using Hadoop Eagle in production. We have plans to extend
>>>it
>>> >>to
>>> >> all hadoop clusters and eventually other data platforms. There are
>>>10¹s
>>> >>of
>>> >> policies onboarded and actively monitored with plans to onboard
>>>more use
>>> >> case. We are very confident that every hadoop cluster in the world
>>>will
>>> >>be
>>> >> monitored using Eagle for securing the hadoop ecosystem by actively
>>> >> monitoring for data access on sensitive data. We plan to extend and
>>> >> diversify this community further through Apache. We presented Eagle
>>>at
>>> >>the
>>> >> hadoop summit in china and garnered interest from different
>>>companies
>>> >>who
>>> >> use hadoop extensively.
>>> >>
>>> >> Inexperience with Open Source
>>> >> The core developers are all active users and followers of open
>>>source.
>>> >> They are already committers and contributors to the Eagle Github
>>> >>project.
>>> >> All have been involved with the source code that has been released
>>> >>under an
>>> >> open source license, and several of them also have experience
>>>developing
>>> >> code in an open source environment. Though the core set of
>>>Developers do
>>> >> not have Apache Open Source experience, there are plans to onboard
>>> >> individuals with Apache open source experience on to the project.
>>>Apache
>>> >> Kylin PMC members are also in the same ebay organization. We work
>>>very
>>> >> closely with Apache Ranger committers and are looking forward to
>>>find
>>> >> meaningful integrations to improve the security of hadoop platform.
>>> >>
>>> >> Homogenous Developers
>>> >> The core developers are from eBay. Today the problem of monitoring
>>>data
>>> >> activities to find and stop threats is a universal problem faced by
>>>all
>>> >>the
>>> >> businesses. Apache Incubation process encourages an open and diverse
>>> >> meritocratic community. Eagle intends to make every possible effort
>>>to
>>> >> build a diverse, vibrant and involved community and has already
>>>received
>>> >> substantial interest from various organizations.
>>> >>
>>> >> Reliance on Salaried Developers
>>> >> eBay invested in Eagle as the monitoring solution for Hadoop
>>>clusters
>>> >>and
>>> >> some of its key engineers are working full time on the project. In
>>> >> addition, since there is a growing need for securing sensitive data
>>> >>access
>>> >> we need a data activity monitoring solution for Hadoop, we look
>>>forward
>>> >>to
>>> >> other Apache developers and researchers to contribute to the
>>>project.
>>> >> Additional contributors, including Apache committers have plans to
>>>join
>>> >> this effort shortly. Also key to addressing the risk associated with
>>> >> relying on Salaried developers from a single entity is to increase
>>>the
>>> >> diversity of the contributors and actively lobby for Domain experts
>>>in
>>> >>the
>>> >> security space to contribute. Eagle intends to do this.
>>> >>
>>> >> Relationships with Other Apache Products
>>> >> Eagle has a strong relationship and dependency with Apache Hadoop,
>>> >>HBase,
>>> >> Spark, Kafka and Storm. Being part of Apache¹s Incubation community,
>>> >>could
>>> >> help with a closer collaboration among these projects and as well as
>>> >> others. An Excessive Fascination with the Apache Brand Eagle is
>>> >>proposing
>>> >> to enter incubation at Apache in order to help efforts to diversify
>>>the
>>> >> committer-base, not so much to capitalize on the Apache brand. The
>>>Eagle
>>> >> project is in production use already inside eBay, but is not
>>>expected
>>> >>to be
>>> >> an eBay product for external customers. As such, the Eagle project
>>>is
>>> >>not
>>> >> seeking to use the Apache brand as a marketing tool.
>>> >>
>>> >> Documentation
>>> >> Information about Eagle can be found at
>>>https://github.com/eBay/Eagle.
>>> >> The following link provide more information about Eagle
>>> >>http://goeagle.io.
>>> >>
>>> >> Initial Source
>>> >> Eagle has been under development since 2014 by a team of engineers
>>>at
>>> >>eBay
>>> >> Inc. It is currently hosted on Github.com under an Apache license
>>>2.0 at
>>> >> https://github.com/eBay/Eagle. Once in incubation we will be moving
>>>the
>>> >> code base to apache git library.
>>> >>
>>> >> External Dependencies
>>> >> Eagle has the following external dependencies.
>>> >> Basic
>>> >> €JDK 1.7+
>>> >> €Scala 2.10.4
>>> >> €Apache Maven
>>> >> €JUnit
>>> >> €Log4j
>>> >> €Slf4j
>>> >> €Apache Commons
>>> >> €Apache Commons Math3
>>> >> €Jackson
>>> >> €Siddhi CEP engine
>>> >>
>>> >> Hadoop
>>> >> €Apache Hadoop
>>> >> €Apache HBase
>>> >> €Apache Hive
>>> >> €Apache Zookeeper
>>> >> €Apache Curator
>>> >>
>>> >> Apache Spark
>>> >> €Spark Core Library
>>> >>
>>> >> REST Service
>>> >> €Jersey
>>> >>
>>> >> Query
>>> >> €Antlr
>>> >>
>>> >> Stream processing
>>> >> €Apache Storm
>>> >> €Apache Kafka
>>> >>
>>> >> Web
>>> >> €AngularJS
>>> >> €jQuery
>>> >> €Bootstrap V3
>>> >> €Moment JS
>>> >> €Admin LTE
>>> >> €html5shiv
>>> >> €respond
>>> >> €Fastclick
>>> >> €Date Range Picker
>>> >> €Flot JS
>>> >>
>>> >> Cryptography
>>> >> Eagle will eventually support encryption on the wire. This is not
>>>one of
>>> >> the initial goals, and we do not expect Eagle to be a controlled
>>>export
>>> >> item due to the use of encryption. Eagle supports but does not
>>>require
>>> >>the
>>> >> Kerberos authentication mechanism to access secured Hadoop services.
>>> >>
>>> >> Required Resources
>>> >>
>>> >> Mailing List
>>> >> €eagle-private for private PMC discussions
>>> >> €eagle-dev for developers
>>> >> €eagle-commits for all commits
>>> >> €eagle-users for all eagle users
>>> >>
>>> >> Subversion Directory
>>> >> €Git is the preferred source control system.
>>> >>
>>> >> Issue Tracking
>>> >> €JIRA Eagle (Eagle)
>>> >>
>>> >> Other Resources
>>> >> The existing code already has unit tests so we will make use of
>>>existing
>>> >> Apache continuous testing infrastructure. The resulting load should
>>>not
>>> >>be
>>> >> very large.
>>> >>
>>> >> Initial Committers
>>> >> €Seshu Adunuthula <sadunuthula at ebay dot com>
>>> >> €Arun Manoharan <armanoharan at ebay dot com>
>>> >> €Edward Zhang <yonzhang at ebay dot com>
>>> >> €Hao Chen <hchen9 at ebay dot com>
>>> >> €Chaitali Gupta <cgupta at ebay dot com>
>>> >> €Libin Sun <libsun at ebay dot com>
>>> >> €Jilin Jiang <jiljiang at ebay dot com>
>>> >> €Qingwen Zhao <qingwzhao at ebay dot com>
>>> >> €Hemanth Dendukuri <hdendukuri at ebay dot com>
>>> >> €Senthil Kumar <senthilkumar at ebay dot com>
>>> >> €Tan Chen <tanchen at ebay dot com>
>>> >>
>>> >> Affiliations
>>> >> The initial committers are employees of eBay Inc.
>>> >>
>>> >> Sponsors
>>> >>
>>> >> Champion
>>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>>> >>
>>> >> Nominated Mentors
>>> >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member,
>>> >> Hortonworks
>>> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
>>> >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
>>> >> Hortonworks
>>> >>
>>> >> Sponsoring Entity
>>> >> We are requesting the Incubator to sponsor this project.
>>> >>
>>> >>
>>> >>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>>
>>>
>>
>>
>>-- 
>>Best Regards,
>>-- Alex
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to