Very nice step and I agree with the no-Scala approach. 

From my experience answering questions on several forums for users over the 
last year. PIO users are not Data Scientists, and are absolutely not Scala 
programers, Many can probably deal with python.

Also what is the use for extending the test? I suspect the first thing a user 
will think about is their template with their data. So we want to  encourage 
template designers to add extensions to cover their template with some sample 
data.

Where is the code? In other Apache projects we would put up a PR for 
discussions like this, even if the code is not really targeted for release—it 
makes code review and questions easier. Is there sample output?

On Jul 28, 2016, at 4:23 PM, Marcin Ziemiński <[email protected]> wrote:

Together with @Chan we have decided that for now writing tests in python is
easier and more flexible.
We have created a simple framework for adding tests executed in unit-test
fashion. Creating tests resembles writing some shell scripts, but helps you
avoid repeating yourself thanks to a few utility functions and shared
rules, e.g. for directory structure.
I have already implemented two simple tests and Chan is currently working
on testing eventserver functionality.

This is how an output can look like:

Running tests...
----------------------------------------------------------------------
 runTest (pio_tests.scenarios.quickstart_test.QuickStartTest) ... [INFO]
quickstart_test 2016-07-28 16:17:31,121: Setting up the engine
[INFO] quickstart_test 2016-07-28 16:17:33,663: Adding a new application
[INFO] quickstart_test 2016-07-28 16:17:36,109: Sending two test events
[INFO] quickstart_test 2016-07-28 16:17:36,390: Checking the number of
events stored on the server
[INFO] quickstart_test 2016-07-28 16:17:36,424: Importing many events
[INFO] quickstart_test 2016-07-28 16:17:42,865: Checking the number of
events stored on the server after the update
[INFO] quickstart_test 2016-07-28 16:17:43,001: Building an engine...
[INFO] quickstart_test 2016-07-28 16:17:48,238: Training...
[INFO] quickstart_test 2016-07-28 16:17:58,613: Deploying and waiting 15s
for it to start...
[INFO] quickstart_test 2016-07-28 16:18:13,621: Sending a single query and
checking results
[INFO] quickstart_test 2016-07-28 16:18:13,799: Stopping deployed engine
[INFO] quickstart_test 2016-07-28 16:18:13,799: Deleting all related data
[INFO] quickstart_test 2016-07-28 16:18:15,037: Removing an app
OK (45.145s)
 runTest (pio_tests.scenarios.basic_app_usecases.BasicAppUsecases) ...
[INFO] basic_app_usecases 2016-07-28 16:18:16,266: Setting up the engine
[INFO] basic_app_usecases 2016-07-28 16:18:16,266: Adding a new application
[INFO] basic_app_usecases 2016-07-28 16:18:18,628: Creating an app again -
should fail
[INFO] basic_app_usecases 2016-07-28 16:18:19,824: Checking if app is on
the list
[INFO] basic_app_usecases 2016-07-28 16:18:21,005: Importing events
[INFO] basic_app_usecases 2016-07-28 16:18:21,189: Checking imported events
[INFO] basic_app_usecases 2016-07-28 16:18:21,194: Deleting entire data
[INFO] basic_app_usecases 2016-07-28 16:18:22,405: Checking if there are no
events at all
[INFO] basic_app_usecases 2016-07-28 16:18:22,412: Clean build
[INFO] basic_app_usecases 2016-07-28 16:18:35,398: Second build
[INFO] basic_app_usecases 2016-07-28 16:18:40,664: import some data first
[INFO] basic_app_usecases 2016-07-28 16:18:41,352: Training
[INFO] basic_app_usecases 2016-07-28 16:18:50,805: Deploying
[INFO] basic_app_usecases 2016-07-28 16:18:50,807: Importing more events
[INFO] basic_app_usecases 2016-07-28 16:18:53,654: Training again
[INFO] basic_app_usecases 2016-07-28 16:19:11,778: Check serving
[INFO] basic_app_usecases 2016-07-28 16:19:12,003: Remove data
[INFO] basic_app_usecases 2016-07-28 16:19:13,256: Retraining should fail
[INFO] basic_app_usecases 2016-07-28 16:19:19,581: Stopping deployed engine
[INFO] basic_app_usecases 2016-07-28 16:19:19,581: Deleting all related data
[INFO] basic_app_usecases 2016-07-28 16:19:20,774: Removing an app
OK (65.711s)

----------------------------------------------------------------------
Ran 2 tests in 110.856s

OK

Generating XML reports...


You can find the draft repository here:
https://github.com/chanlee514/pio-integration-test with a brief description.
The tests.py script could be executed within the environment provided by
the docker image described before and also ran on every TravisCI build.

What do you think?

Regards,
Marcin



wt., 26.07.2016 o 12:04 użytkownik Chan Lee <[email protected]> napisał:

> I think Marcin's proposal for integration testing and the use of Docker
> image is fantastic.
> 
> But instead of extending the python library, I'd like to suggest using
> ScalaTest FunSuite for integration tests. FunSuite is being used for unit
> testing, and I think it would be better practice to use the same framework
> for unit/integration testing. Also, there will be no need for
> cross-referencing external libraries. And considering the scala-focused
> community around PredictionIO, it would be better strategy for encouraging
> contribution in the long run.
> 
> In addition, I'd like to propose using packages
> org.scalatest.BeforeAndAfter for process hooks and sys.process for bash
> commands. For instance, we can begin testing the eventserver in a seperate
> file with a simple process like below.
> 
> class EventserverSuite extends FunSuite with BeforeAndAfter {
>  before {
>    // kill process running on port 7070
>    // launch eventserver
>  }
>  test("Eventserver is successfully running") {
>    val response = "curl localhost:7070" !!
>    reponse should be "{"status":"alive"}"
>  }
>  after {
>     // shut down eventserver
>  }
> }
> 
> We can create separate files for testing eventserver, app (for pio app
> new/delete/...), and template(s). Maybe the directory structure can be
> something like this?
> root/
>  test/
>    integration/
>      EventserverSuite.scala
>      AppSuite.scala
>      SimilarProductTemplateSuite.scala
> 
>      seed/
>        json data for testing eventserver
>        json data for testing SimilarProductTemplate engine
> 
> 
> Please let me know what you think,
> Chan
> 
> 
> On Sat, Jul 23, 2016 at 7:34 AM, Pat Ferrel <[email protected]> wrote:
> 
>> +1 for integration tests. We have one for the UR that tests PIO too and
>> can be a starter. It is probably possible to create one that is template
>> independent that only tests the EventServer or some null template.
>> 
>> 
>> 
> https://github.com/actionml/template-scala-parallel-universal-recommendation/blob/master/examples/integration-test
>> 
>> By looking at all the scripts called you can see how we employ example
>> code in integrations tests, we could try to encourage this as a pattern
> for
>> new templates and get great benefits.
>> 
>> On Jul 22, 2016, at 8:28 PM, Marcin Ziemiński <[email protected]> wrote:
>> 
>> I agree that engine templates should be downloaded separately by tests,
> but
>> as Donald mentioned, I included an engine and modified it so that it
> could
>> be built after the recent namespaces modifications. Therefore, engine
>> templates should also be updated.
>> 
>> As far as the Docker is concerned I might have not been clear enough. The
>> Dockerfile you can find in the repository was used to build the image
> being
>> pulled on travis. It is called ziemin/pio-testing, because this is
>> currently a draft I pushed to my repository on docker hub. Moreover, this
>> is an essence of this proposal - it gives you a way to run tests in
>> deterministic way in prepared environment. Besides it makes travis builds
>> faster - instead of downloading all dependencies and installing them
> every
>> time, you just fetch  one pre-built package working out of the box.
>> 
>> pt., 22.07.2016 o 16:37 użytkownik Simon Chan <[email protected]>
>> napisał:
>> 
>>> Integration tests is a great idea.
>>> 
>>> Yea I guess going forward a direct git to download templates may be a
>>> viable option.
>>> 
>>> Simon
>>> 
>>> On Friday, July 22, 2016, Donald Szeto <[email protected]> wrote:
>>> 
>>>> Hey guys,
>>>> 
>>>> This proposal of adding integration tests is awesome!
>>>> 
>>>> Echoing Xusen, I recall Pat suggested we could remove pio template
> get,
>>> so
>>>> it should be okay to just git clone the template from somewhere. I
> think
>>>> Marcin is including the template now because currently templates still
>>> use
>>>> artifacts under the old io.prediction package namespace.
>>>> 
>>>> Regards,
>>>> Donald
>>>> 
>>>> On Friday, July 22, 2016, Xusen Yin <[email protected]
> <javascript:;>>
>>>> wrote:
>>>> 
>>>>> Hi Marcin,
>>>>> 
>>>>> Personally I vote for adding integration tests. Thanks for the
>>> proposal.
>>>>> One suggestion is about the test scenarios. IMHO there is no need to
>>> add
>>>>> the recommendation-engin template inside the predictionio codebase.
> Why
>>>> not
>>>>> use pio template to download them from Github when testing?
>>>>> 
>>>>> Another concern is the docker pull ziemin/pio-testing in the travis
>>>> config
>>>>> file. And there is also a testing/Dockerfile which starts from
> ubuntu.
>>>> So I
>>>>> think either we should use docker pull ubuntu, or we use a pre-built
>>>>> testing Docker image instead of the Dockerfile.
>>>>> 
>>>>> Best
>>>>> Xusen Yin
>>>>> 
>>>>>> On Jul 22, 2016, at 2:52 PM, Marcin Ziemiński <[email protected]
>>>> <javascript:;>
>>>>> <javascript:;>> wrote:
>>>>>> 
>>>>>> Hi!
>>>>>> 
>>>>>> I have a feeling that PredictionIO is lacking integration tests.
>>>> TravisCI
>>>>>> is executed only on unit tests residing in the repository. Not only
>>>>> better
>>>>>> tests are important for keeping quality of the project, but also for
>>>> the
>>>>>> sheer comfort of development. Therefore, I tried to come up with
> some
>>>>>> simple basis for adding and building tests.
>>>>>> 
>>>>>> - Integration tests should be agnostic to environment settings (it
>>>>>> should not matter whether we use Postgres or HBase)
>>>>>> - They should be easy to run for developers and the configuration
>>>>> should
>>>>>> not pollute their working space
>>>>>> 
>>>>>> I have pushed a sequence of commits to my personal fork and ran
>>> travis
>>>>>> builds on them - Diff with upstream
>>>>>> <
>>>>> 
>>>> 
>>> 
>> 
> https://github.com/apache/incubator-predictionio/compare/develop...Ziemin:testing-infrastructure
>>>>>> 
>>>>>> 
>>>>>> The following changes were introduced:
>>>>>> 
>>>>>> - Dedicated Docker image was prepared. This image fetches and
>>>> prepares
>>>>>> some possible dependencies for PredictionIO - postgres, hbase,
>>> spark,
>>>>>> elasticsearch.
>>>>>> Upon container initialization all services are started including
>>>> spark
>>>>>> standalone cluster. The best way to start it is to use
>>>>>> testing/run_docker.sh script, which binds relevant ports, mounts
>>>> shared
>>>>>> directories with ivy2 cache and PredictionIO's code repository.
>>> More
>>>>>> importantly it sets up pio's configuration, e.g.:
>>>>>> $ /run_docker.sh PGSQL HBASE HDFS ~/projects/incubator-predictionio
>>>>>> '/pio_host/testing/simple_scenario/run_scenario.sh'
>>>>>> This command should set metadata repo to PGSQL, event data to HBASE
>>>> and
>>>>>> model data to HDFS. The last two arguments are path to repo and a
>>>>> command
>>>>>> to run from inside the container.
>>>>>> An important thing to note is that container expects a tar with the
>>>>>> built distribution to be found in shared /pio_host directory, which
>>>> is
>>>>>> later unpacked.
>>>>>> User can then safely execute all pio ... commands. By default
>>>> container
>>>>>> pop up a bash shell if not given any other commands.
>>>>>> - Currently there is only one simple test added, which is just a
>>> copy
>>>>> of
>>>>>> the steps mentioned in the quickstart tutorial.
>>>>>> - .travis.yml was modified to run 4 concurrent builds: one for unit
>>>>>> tests as previously and three integration tests for various
>>>>> combinations of
>>>>>> services
>>>>>> env:
>>>>>>   global:
>>>>>>     - PIO_HOME=`pwd`
>>>>>> 
>>>>>>   matrix:
>>>>>>     - BUILD_TYPE=Unit
>>>>>>     - BUILD_TYPE=Integration METADATA_REP=PGSQL EVENTDATA_REP=PGSQL
>>>>>> MODELDATA_REP=PGSQL
>>>>>>     - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>>>>>> EVENTDATA_REP=HBASE MODELDATA_REP=LOCALFS
>>>>>>     - BUILD_TYPE=Integration METADATA_REP=ELASTICSEARCH
>>>>>> EVENTDATA_REP=PGSQL MODELDATA_REP=HDFS
>>>>>> Here you can find the build logs: travis logs
>>>>>> <
>>>> https://travis-ci.org/Ziemin/incubator-predictionio/builds/146753806>
>>>>>> What is more, to make build times shorter, ivy jars are cached on
>>>>> travis
>>>>>> now, so that they are included faster in subsequent tests.
>>>>>> 
>>>>>> The current setup let developers have an easy way to run tests for
>>>>>> different environment settings in a deterministic way, as well as
> use
>>>>>> travis or other CI tools in more convenient way. What is left to do
>>> now
>>>>> is
>>>>>> to prepare a sensible set of different tests written in a concise
> and
>>>>>> extensible way. I think that ideally we could use python API and add
>>> a
>>>>>> small library to it focused strictly on our testing purposes.
>>>>>> 
>>>>>> Any insights would be invaluable.
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> -- Marcin
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
> 

Reply via email to