Hi Taylor, I don't know the Spark community's opinion on the "outright vs
subproject" issue, although I have told a couple people in that community about
the proposal and have posted an FYI to the spark-dev list. From a technical
perspective, Spark-Kernel mainly uses public Spark APIs (except for some SparkR
usage,
https://github.com/ibm-et/spark-kernel/blob/master/sparkr-interpreter/src/main/resources/README.md)
and so I guess the answer could go either way depending on the Spark community.
Thanks,
David
> On November 12, 2015 at 8:05 PM "P. Taylor Goetz" wrote:
>
>
> Just a quick (or maybe not :) ) question...
>
> Given the tight coupling to the Apache Spark project, were there any
> considerations or discussions with the Spark community regarding including the
> Spark-Kernel functionality outright in Spark, or the possibility of becoming a
> subproject?
>
> I'm just curious. I don't think an answer one way or another would necessarily
> block incubation.
>
> -Taylor
>
> > On Nov 12, 2015, at 7:17 PM, da...@fallside.com wrote:
> >
> > Hello, we would like to start a discussion on accepting the Spark-Kernel,
> > a mechanism for applications to interactively and remotely access Apache
> > Spark, into the Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/SparkKernelProposal, and it is appended
> > to this email.
> >
> > We are looking for additional mentors to help with this project, and we
> > would much appreciate your guidance and advice.
> >
> > Thank-you in advance,
> > David Fallside
> >
> >
> >
> > = Spark-Kernel Proposal =
> >
> > == Abstract ==
> > Spark-Kernel provides applications with a mechanism to interactively and
> > remotely access Apache Spark.
> >
> > == Proposal ==
> > The Spark-Kernel enables interactive applications to access Apache Spark
> > clusters. More specifically:
> > * Applications can send code-snippets and libraries for execution by Spark
> > * Applications can be deployed separately from Spark clusters and
> > communicate with the Spark-Kernel using the provided Spark-Kernel client
> > * Execution results and streaming data can be sent back to calling
> > applications
> > * Applications no longer have to be network connected to the workers on a
> > Spark cluster because the Spark-Kernel acts as each application’s proxy
> > * Work has started on enabling Spark-Kernel to support languages in
> > addition to Scala, namely Python (with PySpark), R (with SparkR), and SQL
> > (with SparkSQL)
> >
> > == Background & Rationale ==
> > Apache Spark provides applications with a fast and general purpose
> > distributed computing engine that supports static and streaming data,
> > tabular and graph representations of data, and an extensive library of
> > machine learning libraries. Consequently, a wide variety of applications
> > will be written for Spark and there will be interactive applications that
> > require relatively frequent function evaluations, and batch-oriented
> > applications that require one-shot or only occasional evaluation.
> >
> > Apache Spark provides two mechanisms for applications to connect with
> > Spark. The primary mechanism launches applications on Spark clusters using
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html); this
> > requires developers to bundle their application code plus any dependencies
> > into JAR files, and then submit them to Spark. A second mechanism is an
> > ODBC/JDBC API
> > (http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine)
> > which enables applications to issue SQL queries against SparkSQL.
> >
> > Our experience when developing interactive applications, such as analytic
> > applications and Jupyter Notebooks, to run against Spark was that the
> > spark-submit mechanism was overly cumbersome and slow (requiring JAR
> > creation and forking processes to run spark-submit), and the SQL interface
> > was too limiting and did not offer easy access to components other than
> > SparkSQL, such as streaming. The most promising mechanism provided by
> > Apache Spark was the command-line shell
> > (http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell)
> > which enabled us to execute code snippets and dynamically control the
> > tasks submitted to a Spark cluster. Spark does not provide the
> > command-line shell as a consumable service but it provided us with the
> > starting point from which we developed