On Sun, Jan 22, 2017 at 10:24 PM Marvin Humphrey <mar...@rectangular.com> wrote:
> On Sat, Jan 21, 2017 at 9:34 AM, John D. Ament <johndam...@apache.org> > wrote: > > On Sat, Jan 21, 2017 at 12:19 PM Marvin Humphrey <mar...@rectangular.com > > > > wrote: > > > >> On Sat, Jan 21, 2017 at 6:41 AM, John D. Ament <john.d.am...@gmail.com> > >> wrote: > >> > However, regarding the > >> > binaries. In a recent discussion (on legal-discuss) it was decided > that > >> > this was OK. Ideally the NOTICE would include the information on the > >> > binary's source of origin (assuming that the source was eligible to be > >> > licensed this way). In this case, the .tar.gz is actually the > >> > distribution of Apache Spark R that looks like its required to build > >> Toree. > >> > >> I must have missed this on legal-discuss, and it's counter to my > >> understanding. Can you please provide a link? > >> > >> Here is something I wrote to legal-discuss recently, which talks about > >> some of the security reasons why bundling a binary dependency is > >> problematic: https://s.apache.org/OuNX > >> > >> > > Same thread. Specifically Mark T's response [1] and Craig's affirmation > [2] > > > > [1]: > > > https://lists.apache.org/thread.html/995d9ddda07363faff5306154ff3a3aa100a07aad191785d866ae097@%3Clegal-discuss.apache.org%3E > > [2]: > > > https://lists.apache.org/thread.html/5f10a28e5f7bf117599d35e14a00290453c1741d614605950ca897c1@%3Clegal-discuss.apache.org%3E > > Let me be clear: compiled code does not belong in our official source > releases. > > Here's the relevant policy clause: > > http://www.apache.org/legal/release-policy#compiled-packages > > My interpretation of the term "compiled code" means compiled versions of the source code within the package. When I look at Toree's package I see three different use cases. 1. Is exactly like what OWB was doing. They're doing dynamic JAR loading and want to test that you can load a class in a JAR. That covers these files: ./scala-interpreter/src/test/resources/ScalaTestJar.jar ./scala-interpreter/src/test/resources/TestJar.jar ./scala-interpreter/src/test/resources/TestJar2.jar 2. There are external dependencies for executing tests. There is seemingly a convenience in packaging an external JAR. It does look like it originates from Apache Spark, and was fixed back in July on their side https://issues.apache.org/jira/browse/SPARK-10683 So it may be that Toree needs to include a similar fix. sparkr-interpreter/src/main/resources/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar 3. The R source code provided by Apache Spark's R module, the tar.gz file. sparkr-interpreter/src/main/resources/sparkr_bundle.tar.gz For the last 2, I have no clear understanding of how they're used in the build. But other than asking that the source files be included in the distribution somewhere, and that the build structure be updated to dynamically create these files. I'll point out that situations like this are exactly why I pushed to remove most of the incubator specific release policy and instead push for improved foundation wide policy. I suspect that Toree did all of this in their release package because Apache Spark was already doing that, and they were leveraging spark functionality, and if a TLP is doing it, it must be correct. > The Apache Software Foundation produces open source software. All > releases > are in the form of the source materials needed to make changes to the > software being released. > > Creating releases which adhere to this policy is almost always > straightforward. Just because there are some edge cases where we have to > apply judgment doesn't invalidate the policy and allow willy-nilly > bundling of > binaries. > > The OpenWebBeans case from legal-discuss was just such an edge case. The > .class file wasn't on the class path and was used only when running unit > tests > for some bytecode stuff. This is quite difficult to exploit. > > The debate on legal-discuss was over whether it was worth doing anything > about, because it was more of a binary resource (like a .jpeg) than > compiled > object form. In the end we didn't even make a policy exception because the > project applied a workaround -- they extracted the bytecode out of the > .class > file and encoded it as a static variable in the source file. > > This wasn't my interpretation of the conversation on legal-discuss. It may be that we need to circle back on that thread and get a definitive answer. > Now, that's not really all that different from a test-time security > standpoint > from having the .class file in a test dir outside the classpath or renaming > `Foo.class` to `Foo.dat` or `Foo.bin`. It is better though from an > auditing > perspective because when changes are made there will be a human-readable > diff > in the commit notification email. > > And that brings me to the bundling of SparkR in Toree. The standard > procedure > would be for the user to fetch that dependency themselves. By embedding > it, > we actually make it *harder* for security-minded consumers to understand > where > their dependencies are coming from. > Looking at their "package-sparkR.sh" script, it seems like it should be straight forward to switch to a directory rather than a tar.gz file. > > I don't see a strong rationale for bundling this dependency. It isn't > compiled code, it's compressed source -- but when it's updated, there's no > diff because the tar.gz is binary. Why not treat it like any other > dependency? > > Marvin Humphrey > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >