[TVM Discuss] [Development/RFC] [RFC][TESTING] Split testing based on cpu/gpu

Tristan Konolige via TVM Discuss Tue, 25 Aug 2020 09:23:12 -0700


# Motivation


Our current test suite takes a while to run. A main reason is that tests that 
only require a cpu are also being run on testing nodes that have gpus. With 
multiple PRs, tests running on gpus are often a limiting factor. Because demand 
is high, PRs have to wait until a gpu node is freed up before testing can begin.

# Proposal

I propose we explicitly mark tests that require a gpu and run only marked tests 
on the gpu.
Pytest provides a mechanism to do this: 
[markers](https://docs.pytest.org/en/stable/example/markers.html).
Markers allow tests to be decorated with `@gpu` (for example) and then pytest 
can select only tests with this marker using `pytest -m gpu`.
Markers can be combined with 
[`pytest.mark.skipif`](https://docs.pytest.org/en/latest/skipping.html#id1), to 
make sure that tests are only run when a required gpu is present.
I propose we use the following markers:
- `tvm.testing.uses_gpu` for tests that use both the gpu and cpu (see below).
- `tvm.testing.requires_gpu` for tests that require the gpu.
- `tvm.testing.requires_cuda` for tests that require the cuda.
- `tvm.testing.requires_...` for tests that require rocm, opencl, etc.

Many tests use a variety of different devices, like llvm, cuda, and rocm.
There are three main ways that tests use devices: 1. tests iterate through 
`tvm.relay.testing.config.ctx_list` 2. tests iterate through 
`tests/python/topi/python/common.py:get_all_backend` and 3. tests iterate 
through a hand picked list of targets and check if the device is enabled with 
`tvm.context(device).exist` and `tvm.runtime.enabled(device)`.
These methods do not allow us to separate out the gpu parts from the cpu parts.
To do this separation, I propose we merge 1. and 2. into a function called 
`tvm.testing.enabled_devices` and replace 3. with a function 
`tvm.testing.device_enabled`. These two functions would use an environment 
variable to determine which devices are enabled (a subset of the ones supported 
by the current build of TVM).

## Cons

- Devices we test against are controlled by an environment variable. 
Environment variables can be hard to discover, so we should document this one 
well.
- Tests that use `tvm.testing.device_enabled` or `tvm.testing.enabled_devices` 
must also mark their testing function with `tvm.testing.uses_gpu`. If they 
don't then the test will never be run with gpu devices. A fix would be having a 
special decorator that parameterizes the test over the devices and sets markers 
appopriately (using [`pytest.mark.parameterize`](pytest parameterize)). 
Unfortunately, this would require rewriting a large amount of tests.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-testing-split-testing-based-on-cpu-gpu/7722/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/f61ba018134c2b0381f544dd30d70a301def91377feb9a4f2e8192ca3ddf69ce).

[TVM Discuss] [Development/RFC] [RFC][TESTING] Split testing based on cpu/gpu

Reply via email to