Re: [Avocado-devel] RFC: Avocado Job API

Lukáš Doktor Wed, 13 Apr 2016 04:44:29 -0700

Dne 13.4.2016 v 06:20 Cleber Rosa napsal(a):



On 04/12/2016 06:43 AM, Lukáš Doktor wrote:

Dne 12.4.2016 v 10:06 Lukáš Doktor napsal(a):

Hello Cleber,

in general I welcome this RFC. This is my 3rd attempt to make my
response understandable. First I'm mentioning the problems, but some
explanations follow at the end of the email.

Dne 11.4.2016 v 14:09 Cleber Rosa napsal(a):

Note: the same content on this message is available at:

https://github.com/clebergnu/avocado/blob/rfc_job_api/docs/rfcs/job-api.rst




Some users may find it easier to read with a prettier formatting.

Problem statement
=================

An Avocado job is created by running the command line ``avocado``
application with the ``run`` command, such as::

   $ avocado run passtest.py

But most of Avocado's power is activated by additional command line
arguments, such as::

   $ avocado run passtest.py --vm-domain=vm1
   $ avocado run passtest.py --remote-hostname=machine1

Even though Avocado supports many features, such as running tests
locally, on a Virtual Machine and on a remote host, only one those can
be used on a given job.

The observed limitations are:

* Job creation is limited by the expressiveness of command line
   arguments, this causes mutual exclusion of some features
* Mapping features to a subset of tests or conditions is not possible
* Once created, and while running, a job can not have its status
   queried and can not be manipulated

Even though Avocado is a young project, its current feature set
already exceeds its flexibility.  Unfortunately, advanced users are
not always free to mix and match those features at will.

Reviewing and Evaluating Avocado
================================

In light of the given problem, let's take a look at what Avocado is,
both by definition and based on its real world, day to day, usage.

Avocado By Definition
---------------------

Avocado is, by definition, "a set of tools and libraries to help with
automated testing".  Here, some points can be made about the two
components that Avocado are made of:

1. Libraries are commonly flexible enough and expose the right
    features in a consistent way.  Libraries that provide good APIs
    allow users to solve their own problems, not always anticipated by
    the library authors.

2. The majority of the Avocado library code fall in two categories:
    utility and test APIs.  Avocado's core libraries are so far, not
    intended to be consumed by third party code and its use is not
    supported in any way.

3. Tools (as in command line applications), are commonly a lot less
    flexible than libraries.  Even the ones driven by command line
    arguments, configuration files and environment variables fall
    short in flexibility when compared to libraries.  That is true even
    when respecting the basic UNIX principles and features that help to
    reuse and combine different tools in a single shell session.

How Avocado is used
-------------------

The vast majority of the observed Avocado use cases, present and
future, includes running tests.  Given the Avocado architecture and
its core concepts, this means running a job.

Avocado, with regards to its real world usage, is pretty much a job
(and test) runner, and there's no escaping that.  It's probable that,
for every one hundredth ``avocado run`` commands, a different
``avocado <subcommand>`` is executed.

Proposed solution & RFC goal
----------------------------

By now, the title of this document may seem a little less
misleading. Still, let's attempt to make it even more clear.

Since Avocado is mostly a job runner that needs to be more flexible,
the most natural approach is to turn more of it into a library.  This
would lead to the creation of a new set of user consumable APIs,
albeit for a different set of users.  Those APIs should allow the
creation of custom job executions, in ways that the Avocado authors
have not yet anticipated.

Having settled on this solution to the stated problem, the primary
goal of this RFC is to propose how such a "Job API" can be
implemented.

Analysis of a Job Environment
=============================

To properly implement a Job API, it's necessary to review what
influences the creation and execution of a job.  Currently, a Job
execution based on the current command line, is driven by, at least,
the following factors:

* Configuration state
* Command line parameters
* Active plugins

The following subsections examines how these would behave in an API
based approach to Job execution.

Configuration state
-------------------

Even though Avocado has a well defined `settings`_ module, it only
provides support for `getting the value`_ of configuration keys. It
lacks the ability to set configuration values at run time.

If the configuration state allowed modifications at run time (in a
well defined and supported way), users could then create many types of
custom jobs with that "tool" alone.

Command line parameters
-----------------------

The need for a strong and predictable correlation between application
builtin defaults, configuration keys and command line parameters is
also a MUST for the implementation of the Job API.

Users writing a custom job will very often need to set a given
behavior that may influence different parts of the Job execution.

Not only that, many use cases may be implemented simply by changing
those defaults in the midst of the job execution.

If users know how to map command line parameters into their
programmable counterparts, advanced custom jobs will be created much
more naturally.

Plugins
-------

Avocado currently relies exclusively on setuptools `entry points`_ to
define the active plugins.  It may be beneficial to add a secondary
activation and deactivation mechanism, one that is locally
configurable.  This is a rather common pattern, and well supported by
the underlying stevedore library.

Given that all plugable components of Avocado are updated to adhere to
the "new plugin" standard, some use cases could be implemented simply
by enabling/disabling plugins (think of "driver" style plugins).  This
can be exclusively or in addition to setting the plugin's own
configuration.

Also, depending on the type of plugin, it may be useful to activate,
deactivate and configure those plugins per job.  Thus, as part of the
Job state, APIs would allow for querying/setting plugins.

Use cases
=========

To aid in the design of an API that solves unforeseen needs, let's
think about a couple of use cases.  Most of these use cases are based
on feedback already received and/or features already requested.

Ordered and conditional test execution
--------------------------------------

A user wants to create a custom job that only runs a benchmark test on
a VM if the VM installation test succeeds.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pseudo code::

   #!/usr/bin/env python
   from avocado import Job
   from avocado.resolver import resolve

   job = Job()

   vm_install =
resolve('io-github-autotest-qemu.unattended_install.cdrom.http_ks.default_install.aio_native')

what plugins are used during "resolve"?


   vm_disk_benchmark =
resolve('io-github-autotest-qemu.autotest.bonnie')

   if job.run_test(vm_install).result == 'PASS':
       job.run_test(vm_disk_benchmark)

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Single test execution API

Run profilers on a single test
------------------------------

A user wants to create a custom job that only runs profilers for the
very first test.  Running the same profilers for all other tests may
be useless to the user, or maybe consume too much I/O resources that
would influence the remaining tests.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avocado, has a configuration key that controls profilers::

   [sysinfo.collect]
   ...
   profiler = False
   ...

By exposing the configuration state, the ``profiler`` key of the
``sysinfo.collect`` section could be enabled for one test, and
disabled for all others. Pseudo code::

   #!/usr/bin/env python
   from avocado import Job
   from avocado.resolver import resolve

   job = Job()
   env = job.environment # property

   env.config.set('sysinfo.collect', 'profiler', True)

What config file is used?

   job.run_test(resolve('build'))

   env.config.set('sysinfo.collect', 'profiler', False)
   job.run_test(resolve('benchmark'))
   job.run_test(resolve('stress'))
   ...
   job.run_test(resolve('netperf'))

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Configuration API
4. Single test execution API

Multi-host test execution
-------------------------

Use case description
~~~~~~~~~~~~~~~~~~~~

User needs to run the same test on different platforms.  User has
hosts with the different platforms already setup and remotely
accessible.

Possible use case fulfillment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avocado currently runs all tests in a job with a single runner.  The
`default runner`_ implementation is a local test runner.  Other tests
runners include the `remote runner`_ and the `vm runner`_.

Pseudo code such as the following could implement the (serial, for
simplicity) test execution in multiple different hosts::

   from avocado import Job
   from avocado.plugin_manager import require
   from avocado.resolver import resolve

   job = Job()
   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)

   runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
   require(runner_plugin)

What plugins are loaded by default and what needs to be required?


   env = job.environment # property
   env.config.set('plugin.runner', 'default', runner_plugin)
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

   test = resolve('hardware_validation.py:RHEL.test')

   host_list = ['rhel6.x86_64.internal',
                ...
                'rhel7.ppc64.internal']

   for host in host_list:
       env.config.set('plugin.runner.RemoteTestRunner', 'host', host)

Remote runner (as well as most of the plugins) does not support per-test
granularity. I reached this problem with the multi-test RFC and we have
to do something with it...

       job.run_test(test)

   print('JOB STATUS: %s' % job.status)

It's actually quite simple to move from a custom Job execution to a
custom Job runner, example::

   #!/usr/bin/env python
   import sys
   from avocado import Job
   from avocado.plugin_manager import require
   from avocado.resolver import resolve

   test = resolve(sys.argv[1])
   host_list = sys.argv[2:]

   runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
   require(runner_plugin)

   job = Job()
   print('JOB ID: %s' % job.unique_id)
   print('JOB LOG: %s' % job.log)

I don't think we need to print those manually. Avocado uses `avocado.*`
streams to communicate with outside world, it should keep using it,
unless the user disables it (changes it ...). Note that avocado should
not change the setting, in this usage it's the library, not the
application.

   env = job.environment # property
   env.config.set('plugin.runner', 'default', runner_plugin)
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

   for host in host_list:
       env.config.set('plugin.runner.RemoteTestRunner', 'host', host)
       job.run_test(test)

   print('JOB STATUS: %s' % job.status)

This took me a while to understand. The difference is that you allow to
set some things using command line.

Which could be run as::

   $ multi hardware_validation.py:RHEL.test
rhel{6,7}.{x86_64,ppc64}.internal
   JOB ID: 54cacfb42f3fa9566b6307ad540fbe594f4a5fa2
   JOB LOG:
/home/<user>/avocado/job-results/job-2016-04-07T16.46-54cacfb/job.log
   JOB STATUS: AVOCADO_ALL_OK

Brainstorming here: How about allowing people to invoke the avocado
parser, which would modify the `config` values? We don't have this
mapping already, but we talked about the need for the ways to unify
config, multiplexer and args.

The workflow would be:

1. parse args:
     - get urls
     - get remote-hostname from it and split it
     - [optionally] get additional arguments eg. from
       --job-params foo=bar [...]
2. instantiate the plugin with overridden values
3. run the test

API Requirements
~~~~~~~~~~~~~~~~

1. Job creation API
2. Test resolution API
3. Configuration API
4. Plugin Management API
5. Single test execution API

Current shortcomings
~~~~~~~~~~~~~~~~~~~~

1. The current Avocado runner implementations do not follow the "new
    style" plugin standard.

2. There's no concept of job environment

3. Lack uniform definition of plugin implementation for "driver" style
    plugins.

4. Lack of automatic ownership of configuration namespace by plugin
name.


Other use cases
===============

The following is a list of other valid use cases which can be
discussed at a later time:

* Use the multiplexer only for some tests.

Multiplexer creates multiple tests. What is the result of
`run_test(test)` then?


* Use the gdb or wrapper feature only for some tests.

* Run Avocado tests and external-runner tests in the same job.

Nitpick: This is currently possible by using simple tests with
arguments.


* Run tests in parallel.

And here we go. Until now everything is doable, but this is amazing and
needful function with lots of challenges in it as currently:

1. plugins are per-process
2. plugins are usually per-job
3. config values are not copied for each test (we can't change them
during execution unless we do so)


* Take actions based on test results (for example, run or skip other
   tests)

* Post-process the logs or test results before the job is done

I'd only say the logs. IIRC the discussion about the multi-test RFC, job
must never-ever change the results. It just runs tests, when there are
failures it fails, but it is only a container and it must never say this
test failed but what the heck, let's pass.


Development Milestones
======================

Since it's clear that Avocado demands many changes to be able to
completely fulfill all mentioned use cases, it seems like a good idea
to define milestones.  Those milestones are not intended to set the
pace of development, but to allow for the maximum number of real world
use cases fulfillment as soon as possible.

Milestone 1
-----------

Includes the delivery of the following APIs:

* Job creation API
* Test resolution API
* Single test execution API

Milestone 2
-----------

Adds to the previous milestone:

* Configuration API

Milestone 3
-----------

Adds to the previous milestone:

* Plugin management API

Milestone 4
-----------

Introduces proper interfaces where previously Configuration and Plugin
management APIs were being used.  For instance, where the following
pseudo code was being used to set the current test runner::

   env = job.environment
   env.config.set('plugin.runner', 'default',
                  'avocado.plugins.runner:RemoteTestRunner')
   env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
   env.config.set('plugin.runner.RemoteTestRunner', 'password',
'123456')

APIs would be introduced that would allow for the following pseudo
code::

   job.load_runner_by_name('RemoteTestRunner')
   if job.runner.accepts_credentials():
       job.runner.set_credentials(username='root', password='123456')

I do like this.


.. _settings:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py




.. _getting the value:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py#L221




.. _default runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/runner.py#L193




.. _remote runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L37




.. _vm runner:
https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L263




.. _entry points:
https://pythonhosted.org/setuptools/pkg_resources.html#entry-points


Uff, lots of thoughts. Let's start with the plugins:

Currently we have

1. subcommand plugins - not related to this email (run, list, multiplex)
2. avocado plugins - related to whole avocado (config)
3. discovery - maps urls to tests (loaders)
4. job-related - modify the job execution (html, json, remote, sysinfo,
vm, xunit)
5. test-related - allow to tweak the test execution (gdb, wrapper,
sysinfo)
6. variants-generating - related to test, but results in several test
variants (multiplexer)


Actually let me elaborate more about the plugins. We should IMO create
`avocado.plugins.job.*` and `avocado.plugins.test.*` plugin namespaces,
where:


If you're talking about modules namespaces, it would only add to the
aesthetics of the project source tree directory.  What actually defines
what kind a plugin is, is what interface it implements, and to what
setuptools namespace it's registered under.

that was meant for human readability...


avocado.plugins.job - is the (4) type of plugin. It should contain the
TestResults hooks (which defines everything one might need to tweak the
job).


This is just wrong.  Can you imagine documenting that, to tweak a job,
you "just" have to add a "*test result*" hook?

IMO it's correct, just the name is wrong. In fact the test result hashooks all over the test. Remember, for a period of time it was whatdrove the `Remote` execution, which only proves it's very flexible. (I'mnot proposing the plugin API should be the TestResult class, but thatTestResult should be implemented as this type of plugin).

Btw this is something I can't imagine to implement the way pre/postplugins work. It requires the state between hooks.

avocado.plugins.test.* - is the (5) type of plugin, which should allow
pre and post test interaction.


Again, the module namespace doesn't play that role, but I understand
that you're proposing pre and post test plugins, which was already
discussed in the other RFC.

Yep, I said I could live with separate plugins, because you can pass thestate through the test state (although having it as one plugin would bemore natural to me).

I'm still struggling on the RemoteRunner. That does not apply to any of
these as it's job-related, but it modifies the runner. So maybe
similarly to multiplexer it deserves another category as it modifies the
test execution and requires per-test granularity (to allow running tests
on different machines):


Agreed.  The "test runner" is another piece of Avocado that deserves
formal and well defined interfaces.


     job.run_test(test, params=None, Runner=None)


We will never be able to add too many parameters to run_test().  That's
why I believe the "job environment" concept is easier to digest and extend.

I know, this was for convenience, but we'll see the most popular itemsones people start using it. Then we'll have plenty of time to improvethe API. In general environment is the way to go. (but remember fabricand it's issues with shared environment. I know in our case it'sper-job, but still it's limiting).

where the runner would be heavily simplified and it should support:

* setUp()  # get the connection, check for avocado...
* run_test()  # runs the test
* tearDown()


Sure, you're defining the interface that "test runner" plugins would
have to implement.

Example:

     job.run_test(test)
     for host in hosts:
         job.run_test(test, runner=RemoteRunner(host))
     runner = RemoteRunner(host)
     for test in [test1, test2, test3]:
         job.run_test(test, runner)

Note: If you dislike parsing it as an argument, we can always set it by
environment or directly to job. Anyway that would require copying this
to job before resuming from `trigger_job` as another test might want to
modify it:


Sure.  The implementation level details such as using the environment or
not will certainly come up a little bit later IMHO.  The important thing
is to understand if we have the same general view.

I believe we do (as I said, I basically agree with this RFC). I was justwondering what are the alternatives to get things done.


     job.runner = RemoteRunner(host)
     job.trigger_test(test)
     # the test must read/store the "runner" before returning
     job.runner = None
     job.trigger_test(test)
     job.wait()

Anyway these are details. My point is that Runner and Multiplexer are
special.


They're certainly very important.  But as much as possible, all the
plugable moving parts should behave similarly.

On the level they are related to. That's why I put the pluginscategorization here. If there is a plugin related only to test, itshould be bounded only to test, etc.


Again, thanks for the feedback!
  - Cleber.

Thx to you ;-)


Some of the (4) are related to test, rather than job, but it was not a
big issue until now as the runner did not allow mixing them. If we want
to allow this, then we should modify:

* remote - to trigger tests, rather than jobs (benefit is that default
runner would get per-test updates and we'd probably got rid of the
RemoteResults)
* sysinfo - no actual modification needed (when not running in
parallel), but logically it should be separated to be triggered
before/after job (belongs to category (4)) and before/after test (5)
* vm - the same as remote

The category 5 is related per test, but currently does not support
modification during run-time. There are additional problems when we want
to support parallel execution as gdb and wrapper are set per-process.

So to solve all those problems, we can either make those plugins
test-process-aware (really ugly) or we need to instantiate the plugin
inside the test process (the `plugin.run` would have to be executed
inside `avocado.core.runner._run_test`.


So to combine my thoughts, the workflow should IMO be (optional user
steps not related to avocado are marked by '*'):

*  initialize logging, pop some arguments from sys.argv, ask for user
input, ....
1. allow to run `from avocado import parser; parser.parse()` to parse
either dictionary or `sys.argv` when None.
2. initialize the config (this also happens on parser.parse() along with
updating the values from args)
*  modify Job-related config (tweak the job-plugins (4))
3. create a Job()
4. instantiate variants-generating plugin(s)
*  modify Test-related config
*  instantiate Test-related plugins
*  process logs
*  yield generated variant
5. run/trigger test
...
6. end job
*  whatever the user wants to do after job end

Some explanations

(1) - is optional and without it avocado uses the default config path
     from avocado import parser
     config = parser.parse(["--config", "/foo.ini"])
     # or parser.parse(None) to use sys.argv

(2) - lazily executed when `config` used (in step (1), or eg. in
resolver, or by using "config")
     from avocado import Config
     config = Config(file=None)
     config.get(...)
     config.set(...)

(3) - assigns job id and invokes the job-plugins (eg pre-job hooks)
     from avocado import Job
     job = Job(config=None)

(4) - gives the object which allows yielding variants (and more)
     from avocado import multiplexer
     mux = multiplexer(files=None)
     # when file=None use `--multiplex value?)
     params = mux.next()

(5) - run the test. There are two ways:
     # modify the avocado environment
     job.run_test()
     # the job.run_test() instantiates the plugins, handles
     # the execution and report test results

     # Another way (prefered by me) is
     job.run_test(environment=None, params=None, ...)
     # basically does the same

     # The difference is when we use `trigger_test` to run in
     # background, where the first needs to always copy the whole
     # environment (deepcopy), while the second can rely on the user

(6) - finishes the job including post-job plugins triggers

As you can see all steps except (3), (5) and (6) are optional and use
defaults, while allowing their modification. So milestones would be:

1. 3,5,6
2. 2
3. 4
4. 1

Regards,
Lukáš


_______________________________________________
Avocado-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/avocado-devel

Re: [Avocado-devel] RFC: Avocado Job API

Reply via email to