Thanks @leandron for the proposal! I agree this will be a great help in 
monitoring the container rebuild process for problems and should reduce the 
headache typically involved with updating containers.

I think we could prototype this first using 
https://discuss.tvm.apache.org/t/ci-how-to-run-your-own-tlcpack-ci-and-proposing-future-improvements-to-https-ci-tlcpack-ai/10123
 so we can iterate without impacting CI runtime, and then migrate it to the 
production TVM CI to run at night when CI load is lessened.

Below I scope out a couple ideas for future work which may help to motivate 
this project.

#### Future work: use autobuilt containers for production TVM CI

I think it would be interesting to implement this and then consider only 
allowing containers built by this process to be promoted to official 
`tlcpack/ci-*` containers. It's likely we would need some additional work over 
this to provide a flexible enough interface (e.g. build selected containers 
on-demand, likely gated to committers) to support this workflow. However, the 
benefit is that all containers would then be built from a known clean revision 
of TVM, so a reproducible build is more likely to occur.

To be sure, this approach doesn't provide 100% reproducibility (the container 
build process contains a bunch of external dependencies e.g. apt packages, 
LLVM, etc), it ensures those dependencies are documented and provides us a path 
to collaborate on future movement in that direction, should we so desire.

#### Future work: Build status dashboard

I think it would be great to also consider creating a concise status dashboard 
that shows a matrix of the build outcomes by container and date. This would 
make it easy to diagnose failures and bisect the range of PRs which may be 
suspect.

#### Future work: TVM Python dependencies

https://discuss.tvm.apache.org/t/rfc-python-dependencies-in-tvm-ci-containers/9011
 proposed some efforts to capture the set of Python deps used in the CI and 
improve their consistency. With this process in place, we should be able to 
finally build the constraints list of x86_64 dependencies. This would allow us 
to ensure that Python packages in ci-cpu, ci-gpu, and ci-lint match. This has 
been a point of confusion for me when debugging CI failures in the past.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/automated-way-to-health-check-tvm-dockerfiles/10347/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/1cf9dad55f7362f4edd8378c474f368d8bc4a793237cd43d7ee6bef3ea674203).

Reply via email to