[jira] [Commented] (PIO-30) Cross build for different versions of scala and spark

ASF GitHub Bot (JIRA) Thu, 01 Sep 2016 12:53:19 -0700

    [ 
https://issues.apache.org/jira/browse/PIO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456437#comment-15456437
 ]


ASF GitHub Bot commented on PIO-30:
-----------------------------------

GitHub user Ziemin opened a pull request:

    https://github.com/apache/incubator-predictionio/pull/288

    [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala …

    I treat this PR as an RFC to get some insights on what could be improved 
and what would be the best for the community. I don't think this is ready for 
merge and it also should not be a part of the upcoming release, but it's an 
example of working crossbuild setup.
    
    ### The good
    The key changes include:
    * `build.sbt` - here I set two versions of scala for the cross build. With 
regards to the current version appropriate versions of spark, akka and hadoop 
are chosen. To run an sbt command for both setups at the same time, you simply 
need to type, e.g sbt +test. To choose scala 2.11.5 specifically you just have 
to write e.g. sbt ++2.10.5 scalastyle. Using sbt in a default way will always 
resort to 2.11.8.
    
    * 
`data/src/main/scala-2.10/org/apache/predictionio/data/SparkVersionDependent.scala`
 and 
`data/src/main/scala-2.11/org/apache/predictionio/data/SparkVersionDependent.scala`
 - are the only examples of version dependent code. They are solely for 
providing a proper type of an object for Spark sql related actions. Sbt is 
smart enough to include version specific source paths like these.
    
    * `make_distribution.sh` - in order to create an archive for Scala 2.10.5 
one has to provide it with an argument (./make_distribution.sh 2.10.5). By 
default it will use 2.11.8.
    
    * **integration tests** - The docker image is updated, I pushed it with a 
tag spark_2.0.0 not to interfere with the current build. It contains both 
versions of Spark and on startup sets up environment according to dependencies 
that predictionIO was built with. It uses a simple Java program 
`tests/docker-files/BuildInfoPrinter.java` linked with the assembly of 
PredictionIO to acquire necessary information. Travis CI makes use of the setup 
and runs 8 parallel builds, the number doubled because of introducing two 
different scala versions.
    
    ### The bad
    Updating Spark caused some troubles
    * The classpath has to be extended to run the unit tests successfully for 
the data sub-package. (see `build.sbt`)
    * Column names have to be handled differently for Postgres in 
`JDBCPevents`, as Spark surrounds them with "..." what breaks the current 
schema in this case
    * `tests/pio_tests/utils.py` - has a special Spark pass through argument to 
set  `spark.sql.warehouse.dir`, because the defaults cause runtime exceptions. 
See -> 
[here](https://mail-archives.apache.org/mod_mbox/spark-user/201608.mbox/%3ccamassd+efz+uscmnzvkfp00qbr9ynv8lrfhvz9lrmnwh2vk...@mail.gmail.com%3E)
    
    ### The ugly
    
    I haven't updated the install script. I think that there are too many 
places containing version strings and related dependencies. This is the same 
for setting up and downloading a proper version of spark in the tests. I think 
that we should come up with a cleaner way of choosing a profile that would be 
consistent between different scripts and easier to maintain. Currently bumping 
version of scala or spark in one place involves modifying many files and is 
very error prone. Ideally there should be a one place with a description of a 
profile that all the other scripts could use for the set up. 
    
    I hope that this PR will lead to some discussion and we will finally devise 
a better solution.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ziemin/incubator-predictionio crossbuild

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-predictionio/pull/288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #288
    
----
commit 85c06aa3cae53ca8158db34e3f5c6788e06b38cf
Author: Marcin Ziemiński <[email protected]>
Date:   2016-08-09T21:20:54Z

    [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala 2.11
    (Spark 2.0.0).
    
    Changes also include updating travis integration tests, which run now
    eight parallel builds.

----


> Cross build for different versions of scala and spark
> -----------------------------------------------------
>
>                 Key: PIO-30
>                 URL: https://issues.apache.org/jira/browse/PIO-30
>             Project: PredictionIO
>          Issue Type: Improvement
>            Reporter: Marcin Ziemiński
>
> The present version of Scala is 2.10 and Spark is 1.4, which is quite old. 
> With Spark 2.0.0 come many performance improvements and features, that people 
> will definitely like to add to their templates. I am also aware that past 
> cannot be ignored and simply dumping 1.x might not be an option for other 
> users. 
> I propose setting up a crossbuild in sbt to build with scala 2.10 and Spark 
> 1.6 and a separate one for Scala 2.11 and Spark 2.0. Most of the files will 
> be consistent between versions including API. The problematic ones will be 
> divided between additional source directories: src/main/scala-2.10/ and 
> src/main/scala-2.11/. The dockerized tests should also take the two versions 
> into consideration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIO-30) Cross build for different versions of scala and spark

Reply via email to