I've got an initial PR for this here: https://github.com/apache/incubator-livy/pull/367
It uses Ubuntu Xenial (similar to our Travis environment) and installs the necessary Python packages and R. I have not added Spark to the package as it gets pulled down as a dependency, but am considering doing that since it's such a large download. I've got another version of the Dockerfile based off of "maven:3-jdk-8-alpine" that works as well. It's quite a bit smaller than the xenial version (521MB vs. 1.18GB), but still figuring out exactly how I want the Docker image to work. A couple open questions - Is it important to have the Dockerfile as close to travis as possible - Will alpine-based images be tough to maintain/support - Will we publish some version of the resulting images, if so size could be important Damon On 2022/12/03 01:20:45 Damon Cortesi wrote: > Coming back to this as I get my dev environment up and running, there's > definitely an intermix of dependencies between Spark, Python, and R that I'm > still working out. > > For example, when I try to start sparkR I get an error message that "package > ‘SparkR’ was built under R version 4.0.4", but locally I have R version 3.5.2 > installed. Spark 3.3.1 says you need R 3.5+. That said, think my version of R > works with Spark2 (at least the tests indicate that...) > > It'd be great to have a minimum viable environment with specific versions and > I hope to have that in a Docker environment by early next week. :) > > Currently I'm just basing it off a debian image with Java8, although there > are Spark images that could be useful... > > Damon > > On 2022/11/20 18:55:35 larry mccay wrote: > > Considering there is no download for anything older than 3.2.x on the > > referred download page, we likely need some change to the README.md to > > reflect a more modern version. > > We also need more explicit instructions for installing Spark than just the > > download. Whether we detail this or point to Spark docs that are sufficient > > is certainly a consideration. > > > > At the end of the day, we are missing any sort of quick start guide for > > devs to be able to successfully build and/or run tests. > > > > Thoughts? > > > > On Sat, Nov 19, 2022 at 6:23 PM larry mccay <[email protected]> wrote: > > > > > Hey Folks - > > > > > > Our Livy README.md indicates the following: > > > > > > To run Livy, you will also need a Spark installation. You can get Spark > > > releases at https://spark.apache.org/downloads.html. > > > > > > Livy requires Spark 2.4+. You can switch to a different version of Spark > > > by setting the SPARK_HOME environment variable in the Livy server > > > process, without needing to rebuild Livy. > > > > > > Do we have any variation on this setup at this point in the real world? > > > > > > What do your dev environments actually look like and how are you > > > installing what versions of Spark as a dependency? > > > > > > Thanks! > > > > > > --larry > > > > > >
