Currently we don't do any sort of automated testing to make sure our Docker 
images are healthy, so it is not uncommon that the images are sometimes broken 
and we don't have visibility of the issues. Only when we decide to update the 
images, then it causes massive pain (e.g. 
https://github.com/apache/tvm/issues/8177).

Rebuilding our Docker images takes at least a couple hours alone. Hence, it is 
currently impractical to rebuild images for every PR or merge, due to time 
constraints.

In order to give visibility of issues in our Dockerfiles, I'd like to propose 
an automated build that can use our existing infrastructure to re-generate the 
images from scratch, once a day, so that problems can be spotted early, without 
increasing even more the time we get to validate our PRs. Also the work needed 
in maintaining the images can be spread in the community.

This proposal can be implemented by two independent Jenkins pipelines. Here is 
a summary of what they would do:

1. P1: **daily-docker-images-rebuild:** fetch the latest Dockerfiles 
definitions on TVM repository and rebuild the images from scratch. If 
successful, uploads the images to a "tlcpack-staging" (provisional name) 
DockerHub account.
2. P2: **daily-docker-image-validate:** pulls the latest images from 
"tlcpack-staging" (provisional name) and runs our existing tests on it, with 
the latest TVM sources.

As mentioned, the pipelines are independent, and **they are not expected to 
make any changes to our production CI** (the images used to run our CI from 
GitHub Pull requests), without manual intervention.

Going a bit more in detail on what each pipeline would accomplish:

daily-docker-images-rebuild
---
* This pipeline is triggered by a timer, running once a day
* Fetch the latest TVM sources
* Uses `docker/build.sh` to rebuild all images currently used in CI: `ci_lint`, 
`ci_cpu`, `ci_gpu`, `ci_arm`, `ci_i386`, `ci_wasm` and `ci_qemu`.
* Tags images with two tags: `latest` and a timestamp based tag 
`YYYY-MM-DD-HH-MM-SS-<short_lastest_tvm_git_hash>`
* If successful, uploads them to an account on DockerHub
* In all cases (success or fail) would send notifications on somewhere visible 
by the community e.g. *Discord channel or mailing lists*

daily-docker-image-validate
---

* This pipeline is triggered by a successful "daily-docker-images-rebuild"
* Fetches the latest TVM and run the existing `tvm/Jenkinsfile` pointing at the 
images generated by the pipeline above
* In all cases (success or fail) would send notifications on somewhere visible 
by the community e.g. *Discord channel or mailing lists*

Next steps
---

I have a draft job that implements "daily-docker-images-rebuild", and I'll be 
posting that in the next few days. In the meantime, I'd like to ask for 
feedback and ideas on how to deal with the issues described here.

cc @areusch @tqchen @ramana-arm @haichen @jroesch @thierry @Lunderberg 
@mbrookhart





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/automated-way-to-health-check-tvm-dockerfiles/10347/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/75f3eee174c34030a4410181c8d697ed3cdee32b45d4f3111e716903ddac1f34).
  • [Apache TVM Discuss] [Developme... Leandro Nunes (Arm) via Apache TVM Discuss

Reply via email to