Hi Marcin, Personally I vote for adding integration tests. Thanks for the proposal. One suggestion is about the test scenarios. IMHO there is no need to add the recommendation-engin template inside the predictionio codebase. Why not use pio template to download them from Github when testing?
Another concern is the docker pull ziemin/pio-testing in the travis config file. And there is also a testing/Dockerfile which starts from ubuntu. So I think either we should use docker pull ubuntu, or we use a pre-built testing Docker image instead of the Dockerfile. Best Xusen Yin > On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected]> wrote: > > Hi! > > I have a feeling that PredictionIO is lacking integration tests. TravisCI > is executed only on unit tests residing in the repository. Not only better > tests are important for keeping quality of the project, but also for the > sheer comfort of development. Therefore, I tried to come up with some > simple basis for adding and building tests. > > - Integration tests should be agnostic to environment settings (it > should not matter whether we use Postgres or HBase) > - They should be easy to run for developers and the configuration should > not pollute their working space > > I have pushed a sequence of commits to my personal fork and ran travis > builds on them - Diff with upstream > <https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure> > > The following changes were introduced: > > - Dedicated Docker image was prepared. This image fetches and prepares > some possible dependencies for PredictionIO - postgres, hbase, spark, > elasticsearch. > Upon container initialization all services are started including spark > standalone cluster. The best way to start it is to use > testing/run_docker.sh script, which binds relevant ports, mounts shared > directories with ivy2 cache and PredictionIO's code repository. More > importantly it sets up pio's configuration, e.g.: > $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio > '/pio_host/testing/simple_scenario/run_scenario.sh' > This command should set metadata repo to PGSQL, event data to HBASE and > model data to HDFS. The last two arguments are path to repo and a command > to run from inside the container. > An important thing to note is that container expects a tar with the > built distribution to be found in shared /pio_host directory, which is > later unpacked. > User can then safely execute all pio ... commands. By default container > pop up a bash shell if not given any other commands. > - Currently there is only one simple test added, which is just a copy of > the steps mentioned in the quickstart tutorial. > - .travis.yml was modified to run 4 concurrent builds: one for unit > tests as previously and three integration tests for various combinations of > services > env: > global: > - PIO_HOME=`pwd` > > matrix: > - BUILD_TYPE=Unit > - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL > MODELDATA_REP=PGSQL > - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH > EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS > - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH > EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS > Here you can find the build logs: travis logs > <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806> > What is more, to make build times shorter, ivy jars are cached on travis > now, so that they are included faster in subsequent tests. > > The current setup let developers have an easy way to run tests for > different environment settings in a deterministic way, as well as use > travis or other CI tools in more convenient way. What is left to do now is > to prepare a sensible set of different tests written in a concise and > extensible way. I think that ideally we could use python API and add a > small library to it focused strictly on our testing purposes. > > Any insights would be invaluable. > > > Regards, > > -- Marcin
