Hi Marcin,

Personally I vote for adding integration tests. Thanks for the proposal. One 
suggestion is about the test scenarios. IMHO there is no need to add the 
recommendation-engin template inside the predictionio codebase. Why not use pio 
template to download them from Github when testing?

Another concern is the docker pull ziemin/pio-testing in the travis config 
file. And there is also a testing/Dockerfile which starts from ubuntu. So I 
think either we should use docker pull ubuntu, or we use a pre-built testing 
Docker image instead of the Dockerfile.  

Best
Xusen Yin

> On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected]> wrote:
> 
> Hi!
> 
> I have a feeling that PredictionIO is lacking integration tests. TravisCI
> is executed only on unit tests residing in the repository. Not only better
> tests are important for keeping quality of the project, but also for the
> sheer comfort of development. Therefore, I tried to come up with some
> simple basis for adding and building tests.
> 
>   - Integration tests should be agnostic to environment settings (it
>   should not matter whether we use Postgres or HBase)
>   - They should be easy to run for developers and the configuration should
>   not pollute their working space
> 
> I have pushed a sequence of commits to my personal fork and ran travis
> builds on them - Diff with upstream
> <https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure>
> 
> The following changes were introduced:
> 
>   - Dedicated Docker image was prepared. This image fetches and prepares
>   some possible dependencies for PredictionIO - postgres, hbase, spark,
>   elasticsearch.
>   Upon container initialization all services are started including spark
>   standalone cluster. The best way to start it is to use
>   testing/run_docker.sh script, which binds relevant ports, mounts shared
>   directories with ivy2 cache and PredictionIO's code repository. More
>   importantly it sets up pio's configuration, e.g.:
>   $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
>   '/pio_host/testing/simple_scenario/run_scenario.sh'
>   This command should set metadata repo to PGSQL, event data to HBASE and
>   model data to HDFS. The last two arguments are path to repo and a command
>   to run from inside the container.
>   An important thing to note is that container expects a tar with the
>   built distribution to be found in shared /pio_host directory, which is
>   later unpacked.
>   User can then safely execute all pio ... commands. By default container
>   pop up a bash shell if not given any other commands.
>   - Currently there is only one simple test added, which is just a copy of
>   the steps mentioned in the quickstart tutorial.
>   - .travis.yml was modified to run 4 concurrent builds: one for unit
>   tests as previously and three integration tests for various combinations of
>   services
>   env:
>     global:
>       - PIO_HOME=`pwd`
> 
>     matrix:
>       - BUILD_TYPE=Unit
>       - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
>   MODELDATA_REP=PGSQL
>       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>   EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
>       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>   EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
>   Here you can find the build logs: travis logs
>   <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
>   What is more, to make build times shorter, ivy jars are cached on travis
>   now, so that they are included faster in subsequent tests.
> 
> The current setup let developers have an easy way to run tests for
> different environment settings in a deterministic way, as well as use
> travis or other CI tools in more convenient way. What is left to do now is
> to prepare a sensible set of different tests written in a concise and
> extensible way. I think that ideally we could use python API and add a
> small library to it focused strictly on our testing purposes.
> 
> Any insights would be invaluable.
> 
> 
> Regards,
> 
> -- Marcin

Reply via email to