Re: [PROPOSAL] Knox Hadoop Gateway Project

Alex Karasulu Tue, 12 Feb 2013 01:05:31 -0800

I thought about this a bit last night. If y'all are interested I too could
also mentor the project. That should add some diversity to the mentors
list. I see value in it and would like to see this community succeed.


I'm not affiliated with any company.


On Mon, Feb 11, 2013 at 9:23 PM, Eric Sammer <esam...@cloudera.com> wrote:

> Kevin:
>
> Makes complete sense.
>
> I'd like to offer to join the project, if it's accepted for incubation. I'm
> a committer on MRUnit and Flume, and on the PMC for both. I've helped both
> projects through the incubation phase, and I also know a little bit about
> this Hadoop thing. ;)
>
> Thanks!
>
>
> On Mon, Feb 11, 2013 at 9:28 AM, Kevin Minder
> <kevin.min...@hortonworks.com>wrote:
>
> > Hi Eric,
> > Let me answer your second question first.
> >
> > Q: Is it your intention to provide job submissions and data ingestion
> APIs
> > for MR and HDFS, respectively?
> > A: Yes we plan to progress the project to cover all existing ecosystem
> > projects.  In addition the project is based on a modular framework that
> > allows for each extension to cover services that are either new or
> > proprietary.  Certainly there exist very high volume data ingest use
> cases
> > for which using a gateway may be impractical but in general the idea is
> to
> > support all required client interaction with Hadoop via the gateway.
> >
> > Now for your first question...
> >
> > Q: Can you explain a bit more about what the target use case is?
> > A: One typical use case will be that the gateway will run in a DMW.  It
> > will as you say be integrations with various directory services and is
> > extensible to cover those not included.  The gateway will then propagate
> > the identity into the Hadoop cluster using Hadoop specific mechanisms.
>  The
> > key point is that there will typically be a single port open on the
> client
> > side to the gateway.  The Hadoop cluster is firewalled, only providing
> > access to the Hadoop services to the gateway instances.
> > A: Another use case is that an organization is already using some SSO
> > solution and the gateway would be integrated with that to verify any SSO
> > token and then propagate the identity to the Hadoop services.
> >
> > I will collect this and add it to the proposal wiki once I have privs to
> > create the page.
> >
> > Thanks!
> > Kevin.
> >
> >
> > On 2/11/13 12:03 PM, Eric Sammer wrote:
> >
> >> Kevin:
> >>
> >> Interesting proposal. Can you explain a bit more about what the target
> use
> >> case is? It sounds like there's SSO-ish functionality (presumably a
> doAs()
> >> machine) with integration with directory services, but the proposal also
> >> mentions a single point for "data and jobs." Is it your intention to
> >> provide job submissions and data ingestion APIs for MR and HDFS,
> >> respectively? Do you plan to target other ecosystem projects such as
> >> HBase?
> >> Sorry if I missed this in the proposal.
> >>
> >> Thanks!
> >>
> >>
> >> On Mon, Feb 11, 2013 at 6:55 AM, Kevin Minder
> >> <kevin.min...@hortonworks.com>**wrote:
> >>
> >>  Knox Gateway Proposal
> >>>
> >>> == Abstract ==
> >>>
> >>> Knox Gateway is a system that provides a single point of secure access
> >>> for
> >>> Apache Hadoop clusters.
> >>>
> >>> == Proposal ==
> >>>
> >>> The Knox Gateway (“Gateway” or “Knox”) is a system that provides a
> single
> >>> point of authentication and access for Apache Hadoop services in a
> >>> cluster.
> >>> The goal is to simplify Hadoop security for both users (i.e. who access
> >>> the
> >>> cluster data and execute jobs) and operators (i.e. who control access
> and
> >>> manage the cluster). The Gateway runs as a server (or cluster of
> servers)
> >>> that serve one or more Hadoop clusters.
> >>>
> >>> Provide perimeter security to make Hadoop security setup easier
> >>> Support authentication and token verification security scenarios
> >>> Deliver users a single cluster end-point that aggregates capabilities
> for
> >>> data and jobs
> >>> Enable integration with enterprise and cloud identity management
> >>> environments
> >>>
> >>> == Background ==
> >>>
> >>> An Apache Hadoop cluster is presented to consumers as a loose
> collection
> >>> of independent services. This makes it difficult for users to interact
> >>> with
> >>> Hadoop since each service maintains it’s own method of access and
> >>> security.
> >>> As well, for operators, configuration and administration of a secure
> >>> Hadoop
> >>> cluster is a complex and many Hadoop clusters are insecure as a result.
> >>>
> >>> == Rationale ==
> >>>
> >>> Organizations that are struggling with Hadoop cluster security result
> in
> >>> a) running Hadoop without security or b) slowing adoption of Hadoop.
> The
> >>> Gateway aims to provide perimeter security that integrates more easily
> >>> into
> >>> existing organizations’ security infrastructure. Doing so will simplify
> >>> security for these organizations and benefit all Hadoop stakeholders
> >>> (i.e.
> >>> users and operators). Additionally, making a dedicated perimeter
> security
> >>> project part of the Apache Hadoop ecosystem will prevent fragmentation
> in
> >>> this area and further increase the value of Hadoop as a data platform.
> >>>
> >>> == Current Status ==
> >>>
> >>> Prototype available, developed by the list of initial committers.
> >>>
> >>> === Meritocracy ===
> >>>
> >>> We desire to build a diverse developer community around Gateway
> following
> >>> the Apache Way. We want to make the project open source and will
> >>> encourage
> >>> contributors from multiple organizations following the Apache
> meritocracy
> >>> model.
> >>>
> >>> === Community ===
> >>>
> >>> We hope to extend the user and developer base in the future and build a
> >>> solid open source community around Gateway. Apache Hadoop has a large
> >>> ecosystem of open source projects, each with a strong community of
> >>> contributors. All project communities in this ecosystem have an
> >>> opportunity
> >>> to participate in the advancement of the Gateway project because
> >>> ultimately, Gateway will enable the security capabilities of their
> >>> project
> >>> to be more enterprise friendly.
> >>>
> >>> === Core Developers ===
> >>>
> >>> Gateway is currently being developed by several engineers from
> >>> Hortonworks
> >>> - Kevin Minder, Larry McCay, John Speidel, Tom Beerbower and Sumit
> >>> Mohanty.
> >>> All the engineers have deep expertise in middleware, security &
> identity
> >>> systems and are quite familiar with the Hadoop ecosystem.
> >>>
> >>> === Alignment ===
> >>>
> >>> The ASF is a natural host for Gateway given that it is already the home
> >>> of
> >>> Hadoop, Hive, Pig, HBase, Oozie and other emerging big data software
> >>> projects. Gateway is designed to solve the security challenges familiar
> >>> to
> >>> the Hadoop ecosystem family of projects.
> >>>
> >>> == Known Risks ==
> >>>
> >>> === Orphaned products & Reliance on Salaried Developers ===
> >>>
> >>> The core developers plan to work full time on the project. We believe
> >>> that
> >>> this project will be of general interest to many Hadoop users and will
> >>> attract a diverse set of contributors. We intend to demonstrate this by
> >>> having contributors from several organizations recognized as committers
> >>> by
> >>> the time Knox graduates from incubation.
> >>>
> >>> === Inexperience with Open Source ===
> >>>
> >>> All of the core developers are active users and followers of open
> source.
> >>> As well, Hortonworks has a strong heritage of success with
> contributions
> >>> to
> >>> Apache Hadoop Projects.
> >>>
> >>> === Homogeneous Developers ===
> >>>
> >>> The current core developers are from Hortonworks, however, we hope to
> >>> establish a developer community that includes contributors from several
> >>> corporations.
> >>>
> >>> === Reliance on Salaried Developers ===
> >>>
> >>> Currently, the developers are paid to do work on Gateway. However, once
> >>> the project has a community built around it, we expect to get
> committers
> >>> and developers from outside the current core developers.
> >>>
> >>> === Relationships with Other Apache Products ===
> >>>
> >>> Gateway is going to be used by the users and operators of Hadoop, and
> the
> >>> Hadoop ecosystem in general.
> >>>
> >>> === A Excessive Fascination with the Apache Brand ===
> >>>
> >>> Our interest in developing Gateway in Apache project is to follow an
> >>> established development model, as well since many of the Hadoop
> ecosystem
> >>> projects also are part of Apache, Gateway will complement those
> projects
> >>> by
> >>> following the same development and contribution model.
> >>>
> >>> == Documentation ==
> >>>
> >>> There is documentation in Hortonworks’ internal repositories. These can
> >>> be
> >>> shared upon request and will be transferred into the Apache CM system
> if
> >>> this proposal is accepted.
> >>>
> >>> == Initial Source ==
> >>>
> >>> The source is currently in Hortonworks’ internal repositories. The
> >>> process
> >>> of making this GitHub repository public has been started and the URL
> will
> >>> be provided once available.
> >>>
> >>> == Source and Intellectual Property Submission Plan ==
> >>>
> >>> The complete Gateway code is under Apache Software License 2.
> >>>
> >>> == External Dependencies ==
> >>>
> >>> The Gateway dependencies are listed below, separated by Category A and
> >>> Category B as defined in the Apache Third-Party Licensing Policy. Note:
> >>> These are the direct dependencies. Indirect dependencies are not
> >>> included.
> >>>
> >>> === Category A Dependencies ===
> >>>
> >>> Apache Commons - ASLv2.0
> >>> commons-io:commons-io#2.4
> >>> commons-cli:commons-cli#1.2
> >>> commons-codec:commons-codec#1.****7
> >>> org.apache.commons:commons-****digester3#3.2
> >>> org.apache.commons:commons-****vfs2#2.0
> >>> Apache Hadoop - ASLv2.0
> >>> org.apache.hadoop:hadoop-auth#****0.23.3
> >>> org.apache.hadoop:hadoop-core#****1.0.3
> >>> Apache Geronimo - ASLv2.0
> >>> org.apache.geronimo.****components:geronimo-jaspi#2.0.****0
> >>> org.apache.geronimo.specs:****geronimo-osgi-locator#1.1
> >>> Apache Shiro - ASLv2.0
> >>> org.apache.shiro:shiro-web#1.****2.1
> >>> ApacheDS - ASLv2.0
> >>> org.apache.directory.server:****apacheds-all#1.5.5
> >>>
> >>> Log4J - ASLv2.0
> >>> log4j:log4j#1.2.17
> >>> SL4J - MIT
> >>> org.slf4j:slf4j-api#1.6.6
> >>> org.slf4j:slf4j-log4j12#1.6.6
> >>> Guava - ASLv2.0
> >>> com.google.guava:guava#14.0-****rc1
> >>> HttpClient - ASLv2.0
> >>> org.apache.httpcomponents:****httpclient#4.2.1
> >>> Jetty - ASLv2.0
> >>> org.eclipse.jetty:jetty-****server#8.1.7.v20120910
> >>> org.eclipse.jetty:jetty-****servlet#8.1.7.v20120910
> >>> org.eclipse.jetty:jetty-****webapp#8.1.7.v20120910
> >>> org.eclipse.jetty:jetty-jaspi#****8.1.7.v20120910
> >>> org.eclipse.jetty.aggregate:****jetty-all#8.1.7.v20120910
> >>> org.eclipse.jetty:test-jetty-****servlet#8.1.7.v20120910
> >>> Spring Security - ASLv2.0
> >>> org.springframework:spring-****core#3.1.3.RELEASE
> >>> org.springframework:spring-****context#3.1.3.RELEASE
> >>> org.springframework:spring-****web#3.1.3.RELEASE
> >>> org.springframework.security:****spring-security-core#3.1.3.****RELEASE
> >>> org.springframework.security:****spring-security-web#3.1.3.****RELEASE
> >>> org.springframework.security:****spring-security-config#3.1.3.**
> >>> **RELEASE
> >>> org.springframework.security:****spring-security-ldap#3.1.2.****RELEASE
> >>> org.springframework.ldap:****spring-ldap-core#1.3.1.RELEASE
> >>> org.springframework.ldap:****spring-ldap-core-tiger#1.3.1.****RELEASE
> >>> org.springframework.ldap:****spring-ldap-odm#1.3.1.RELEASE
> >>> org.springframework.ldap:****spring-ldap-ldif-core#1.3.1.****RELEASE
> >>> org.springframework.ldap:****spring-ldap-ldif-batch#1.3.1.****RELEASE
> >>> JBoss ShrinkWrap - ASLv2.0
> >>> org.jboss.shrinkwrap:****shrinkwrap-api#1.0.1
> >>> org.jboss.shrinkwrap:****shrinkwrap-impl-base#1.0.1
> >>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-**
> >>> descriptors-api-javaee#2.0.0-****alpha-4
> >>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-**
> >>> descriptors-impl-javaee#2.0.0-****alpha-4
> >>>
> >>>
> >>> === Category A Dependencies (Test) ===
> >>>
> >>> EasyMock - ASLv2.0
> >>> org.easymock:easymock#3.0
> >>> XML Matchers - ASLv2.0
> >>> org.xmlmatchers:xml-matchers#****0.10
> >>>
> >>> Hamcrest - BSDv3
> >>> org.hamcrest:hamcrest-api#1.0
> >>> org.hamcrest:hamcrest-core#1.****2.1
> >>> org.hamcrest:hamcrest-library#****1.2.1
> >>> JsonPath - ASLv2.0
> >>> com.jayway.jsonpath:json-path#****0.8.1
> >>> com.jayway.jsonpath:json-path-****assert#0.8.1
> >>>
> >>> XMLTool - ASLv2.0
> >>> com.mycila.xmltool:xmltool#3.3
> >>> REST-assured - ASLv2.0
> >>> com.jayway.restassured:rest-****assured#1.6.2
> >>>
> >>>
> >>> === Category B Dependencies ===
> >>>
> >>> Jersey - CDDLv1.1 or GPL2wCPE
> >>> com.sun.jersey:jersey-server#****1.14
> >>> com.sun.jersey:jersey-servlet#****1.14
> >>> Jerico - EPLv1.0
> >>> net.htmlparser.jericho:****jericho-html#3.2
> >>>
> >>> Servlet - CDDLv1.0 or GPLv2
> >>> javax.servlet:javax.servlet-****api#3.0.1
> >>>
> >>> JUnit - CPLv1.0
> >>> junit:junit#4.11
> >>>
> >>> == Cryptography ==
> >>>
> >>> The Gateway uses cryptographic software indirectly as a result of
> having
> >>> two dependencies: ApacheDS and Apache Shiro. Gateway does not include
> any
> >>> special or custom cryptographic technologies.
> >>>
> >>> ApacheDS is an ASF project and has been classified Export Commodity
> >>> Control Number (ECCN) 5D002.C.1 due to it’s dependency on Bouncy
> Castle.
> >>> More information on the ApacheDS classification can be found at
> >>> http://svn.apache.org/repos/****asf/directory/apacheds/trunk/****<
> http://svn.apache.org/repos/**asf/directory/apacheds/trunk/**>
> >>> installers/README<http://svn.**apache.org/repos/asf/**
> >>> directory/apacheds/trunk/**installers/README<
> http://svn.apache.org/repos/asf/directory/apacheds/trunk/installers/README
> >
> >>> >
> >>>
> >>>
> >>> Apache Shiro is an ASF project and has been classified Export Commodity
> >>> Control Number (ECCN) 5D002.C.1. More information on the Apache Shiro
> >>> classification can be found at http://svn.apache.org/repos/**
> >>> asf/shiro/trunk/README<http://**svn.apache.org/repos/asf/**
> >>> shiro/trunk/README <http://svn.apache.org/repos/asf/shiro/trunk/README
> >>
> >>>
> >>>
> >>> == Required Resources ==
> >>>
> >>> === Mailing lists ===
> >>>
> >>> knox-dev AT incubator DOT apache DOT org
> >>> knox-commits AT incubator DOT apache DOT org
> >>> knox-user AT hms incubator apache DOT org
> >>> knox-private AT incubator DOT apache DOT org
> >>>
> >>> === Subversion Directory ===
> >>>
> >>> https://svn.apache.org/repos/****asf/incubator/knox<
> https://svn.apache.org/repos/**asf/incubator/knox>
> >>> <https://**svn.apache.org/repos/asf/**incubator/knox<
> https://svn.apache.org/repos/asf/incubator/knox>
> >>> >
> >>>
> >>>
> >>> === Issue Tracking ===
> >>>
> >>> JIRA Knox (KNOX)
> >>>
> >>> == Initial Committers ==
> >>>
> >>> Kevin Minder (kevin DOT minder AT hortonworks DOT com)
> >>> Larry McCay (lmccay AT hortonworks DOT com)
> >>> John Speidel (jspeidel AT hortonworks DOT com)
> >>> Tom Beerbower (tbeerbower AT hortonworks DOT com)
> >>> Sumit Mohanty (smohanty AT hortonworks DOT com)
> >>>
> >>> == Affiliations ==
> >>>
> >>> Kevin Minder (Hortonworks)
> >>> Larry McCay (Hortonworks)
> >>> John Speidel (Hortonworks)
> >>> Tom Beerbower (Hortonworks)
> >>> Sumit Mohanty (Hortonworks)
> >>>
> >>> == Sponsors ==
> >>>
> >>> === Champion ===
> >>>
> >>> Devaraj Das (ddas AT apache DOT org)
> >>>
> >>> === Nominated Mentors ===
> >>>
> >>> Owen O’Malley (omalley AT apache DOT org)
> >>> Mahadev Konar (mahadev AT apache DOT org)
> >>> Alan Gates (gates AT apache DOT org)
> >>> Devaraj Das (ddas AT apache DOT org)
> >>>
> >>> === Sponsoring Entity ===
> >>>
> >>> Incubator PMC
> >>>
> >>> ------------------------------****----------------------------**
> >>> --**---------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.****apache.org<
> >>> general-**unsubscribe@incubator.apache.**org<
> general-unsubscr...@incubator.apache.org>
> >>> >
> >>> For additional commands, e-mail: general-help@incubator.apache.
> ****org<
> >>> general-help@incubator.**apache.org <general-h...@incubator.apache.org
> >>
> >>>
> >>>
> >>>
> >>
> >
> > ------------------------------**------------------------------**---------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.**apache.org<
> general-unsubscr...@incubator.apache.org>
> > For additional commands, e-mail: general-help@incubator.apache.**org<
> general-h...@incubator.apache.org>
> >
> >
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>



-- 
Best Regards,
-- Alex

Re: [PROPOSAL] Knox Hadoop Gateway Project

Reply via email to