On Sun, Jan 22, 2017 at 10:24 PM Marvin Humphrey <mar...@rectangular.com>
wrote:

> On Sat, Jan 21, 2017 at 9:34 AM, John D. Ament <johndam...@apache.org>
> wrote:
> > On Sat, Jan 21, 2017 at 12:19 PM Marvin Humphrey <mar...@rectangular.com
> >
> > wrote:
> >
> >> On Sat, Jan 21, 2017 at 6:41 AM, John D. Ament <john.d.am...@gmail.com>
> >> wrote:
> >> > However, regarding the
> >> > binaries.  In a recent discussion (on legal-discuss) it was decided
> that
> >> > this was OK.  Ideally the NOTICE would include the information on the
> >> > binary's source of origin (assuming that the source was eligible to be
> >> > licensed this way).  In this case, the .tar.gz  is actually the
> >> > distribution of Apache Spark R that looks like its required to build
> >> Toree.
> >>
> >> I must have missed this on legal-discuss, and it's counter to my
> >> understanding. Can you please provide a link?
> >>
> >> Here is something I wrote to legal-discuss recently, which talks about
> >> some of the security reasons why bundling a binary dependency is
> >> problematic: https://s.apache.org/OuNX
> >>
> >>
> > Same thread.  Specifically Mark T's response [1] and Craig's affirmation
> [2]
> >
> > [1]:
> >
> https://lists.apache.org/thread.html/995d9ddda07363faff5306154ff3a3aa100a07aad191785d866ae097@%3Clegal-discuss.apache.org%3E
> > [2]:
> >
> https://lists.apache.org/thread.html/5f10a28e5f7bf117599d35e14a00290453c1741d614605950ca897c1@%3Clegal-discuss.apache.org%3E
>
> Let me be clear: compiled code does not belong in our official source
> releases.
>
> Here's the relevant policy clause:
>
>   http://www.apache.org/legal/release-policy#compiled-packages
>
>
My interpretation of the term "compiled code" means compiled versions of
the source code within the package.  When I look at Toree's package I see
three different use cases.

1. Is exactly like what OWB was doing.  They're doing dynamic JAR loading
and want to test that you can load a class in a JAR.  That covers these
files:
./scala-interpreter/src/test/resources/ScalaTestJar.jar
./scala-interpreter/src/test/resources/TestJar.jar
./scala-interpreter/src/test/resources/TestJar2.jar

2. There are external dependencies for executing tests.  There is seemingly
a convenience in packaging an external JAR.  It does look like it
originates from Apache Spark, and was fixed back in July on their side
https://issues.apache.org/jira/browse/SPARK-10683
So it may be that Toree needs to include a similar fix.

sparkr-interpreter/src/main/resources/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar

3. The R source code provided by Apache Spark's R module, the tar.gz file.

sparkr-interpreter/src/main/resources/sparkr_bundle.tar.gz

For the last 2, I have no clear understanding of how they're used in the
build.  But other than asking that the source files be included in the
distribution somewhere, and that the build structure be updated to
dynamically create these files.

I'll point out that situations like this are exactly why I pushed to remove
most of the incubator specific release policy and instead push for improved
foundation wide policy.  I suspect that Toree did all of this in their
release package because Apache Spark was already doing that, and they were
leveraging spark functionality, and if a TLP is doing it, it must be
correct.




>   The Apache Software Foundation produces open source software. All
> releases
>   are in the form of the source materials needed to make changes to the
>   software being released.
>
> Creating releases which adhere to this policy is almost always
> straightforward.  Just because there are some edge cases where we have to
> apply judgment doesn't invalidate the policy and allow willy-nilly
> bundling of
> binaries.
>
> The OpenWebBeans case from legal-discuss was just such an edge case.  The
> .class file wasn't on the class path and was used only when running unit
> tests
> for some bytecode stuff.  This is quite difficult to exploit.
>
> The debate on legal-discuss was over whether it was worth doing anything
> about, because it was more of a binary resource (like a .jpeg) than
> compiled
> object form.  In the end we didn't even make a policy exception because the
> project applied a workaround -- they extracted the bytecode out of the
> .class
> file and encoded it as a static variable in the source file.
>
>
This wasn't my interpretation of the conversation on legal-discuss.  It may
be that we need to circle back on that thread and get a definitive answer.


> Now, that's not really all that different from a test-time security
> standpoint
> from having the .class file in a test dir outside the classpath or renaming
> `Foo.class` to `Foo.dat` or `Foo.bin`.  It is better though from an
> auditing
> perspective because when changes are made there will be a human-readable
> diff
> in the commit notification email.
>
> And that brings me to the bundling of SparkR in Toree.  The standard
> procedure
> would be for the user to fetch that dependency themselves.  By embedding
> it,
> we actually make it *harder* for security-minded consumers to understand
> where
> their dependencies are coming from.
>

Looking at their "package-sparkR.sh" script, it seems like it should be
straight forward to switch to a directory rather than a tar.gz file.


>
> I don't see a strong rationale for bundling this dependency.  It isn't
> compiled code, it's compressed source -- but when it's updated, there's no
> diff because the tar.gz is binary.  Why not treat it like any other
> dependency?
>
> Marvin Humphrey
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Reply via email to