Hey guys,

This proposal of adding integration tests is awesome!

Echoing Xusen, I recall Pat suggested we could remove pio template get, so
it should be okay to just git clone the template from somewhere. I think
Marcin is including the template now because currently templates still use
artifacts under the old io.prediction package namespace.

Regards,
Donald

On Friday, July 22, 2016, Xusen Yin <[email protected]> wrote:

> Hi Marcin,
>
> Personally I vote for adding integration tests. Thanks for the proposal.
> One suggestion is about the test scenarios. IMHO there is no need to add
> the recommendation-engin template inside the predictionio codebase. Why not
> use pio template to download them from Github when testing?
>
> Another concern is the docker pull ziemin/pio-testing in the travis config
> file. And there is also a testing/Dockerfile which starts from ubuntu. So I
> think either we should use docker pull ubuntu, or we use a pre-built
> testing Docker image instead of the Dockerfile.
>
> Best
> Xusen Yin
>
> > On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected]
> <javascript:;>> wrote:
> >
> > Hi!
> >
> > I have a feeling that PredictionIO is lacking integration tests. TravisCI
> > is executed only on unit tests residing in the repository. Not only
> better
> > tests are important for keeping quality of the project, but also for the
> > sheer comfort of development. Therefore, I tried to come up with some
> > simple basis for adding and building tests.
> >
> >   - Integration tests should be agnostic to environment settings (it
> >   should not matter whether we use Postgres or HBase)
> >   - They should be easy to run for developers and the configuration
> should
> >   not pollute their working space
> >
> > I have pushed a sequence of commits to my personal fork and ran travis
> > builds on them - Diff with upstream
> > <
> https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure
> >
> >
> > The following changes were introduced:
> >
> >   - Dedicated Docker image was prepared. This image fetches and prepares
> >   some possible dependencies for PredictionIO - postgres, hbase, spark,
> >   elasticsearch.
> >   Upon container initialization all services are started including spark
> >   standalone cluster. The best way to start it is to use
> >   testing/run_docker.sh script, which binds relevant ports, mounts shared
> >   directories with ivy2 cache and PredictionIO's code repository. More
> >   importantly it sets up pio's configuration, e.g.:
> >   $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
> >   '/pio_host/testing/simple_scenario/run_scenario.sh'
> >   This command should set metadata repo to PGSQL, event data to HBASE and
> >   model data to HDFS. The last two arguments are path to repo and a
> command
> >   to run from inside the container.
> >   An important thing to note is that container expects a tar with the
> >   built distribution to be found in shared /pio_host directory, which is
> >   later unpacked.
> >   User can then safely execute all pio ... commands. By default container
> >   pop up a bash shell if not given any other commands.
> >   - Currently there is only one simple test added, which is just a copy
> of
> >   the steps mentioned in the quickstart tutorial.
> >   - .travis.yml was modified to run 4 concurrent builds: one for unit
> >   tests as previously and three integration tests for various
> combinations of
> >   services
> >   env:
> >     global:
> >       - PIO_HOME=`pwd`
> >
> >     matrix:
> >       - BUILD_TYPE=Unit
> >       - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
> >   MODELDATA_REP=PGSQL
> >       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
> >   EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
> >       - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
> >   EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
> >   Here you can find the build logs: travis logs
> >   <https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
> >   What is more, to make build times shorter, ivy jars are cached on
> travis
> >   now, so that they are included faster in subsequent tests.
> >
> > The current setup let developers have an easy way to run tests for
> > different environment settings in a deterministic way, as well as use
> > travis or other CI tools in more convenient way. What is left to do now
> is
> > to prepare a sensible set of different tests written in a concise and
> > extensible way. I think that ideally we could use python API and add a
> > small library to it focused strictly on our testing purposes.
> >
> > Any insights would be invaluable.
> >
> >
> > Regards,
> >
> > -- Marcin
>
>

Reply via email to