@tqchen Thanks for the reply! A couple of points:

- You're right that I2/I3 don't capture all dependencies. There is a notable 
gap between a set of Python dependencies and the docker container: the 
remaining libraries which may be present in the container.
- The method by which those libraries are installed is platform-dependent, but 
so far our approach for the CI has been mostly to depend on a package manager 
to satisfy them.
- I think that even if a lockfile can't be used primarily to reproduce the test 
environment, there are plenty of other uses--here are a couple:
    - Determining which version of a C library is suitable when trying to 
install TVM on a system other than the docker containers. You might prefer to 
at least know the version of the Python package which uses it. I had trouble 
finding a non-dev dependency like this but I think this pattern is very common, 
and 
[Pillow](https://pillow.readthedocs.io/en/latest/installation.html#external-libraries)
 is an example of such a package.
    - Often times a user isn't trying to exactly reproduce the regression--they 
just want to know which minor version of a package to install e.g. tensorflow 
2.1 vs 2.2. This can make a big difference, especially when the main problem is 
API incompatibilty.. Even if a user might not 100% be able to reproduce the 
regression, it's often the case that if they're using TVM in an application, 
they won't be able to use the TVM CI containers as a starting point.

### Updating Docker Containers
I think it's worth thinking through the case of updating the docker containers 
a little more because I'm not sure I follow your reasoning here.

The current process is as follows, per-container:
1. update docker/install and Dockerfile as needed to modify the container
2. run `docker/build.sh` to build the new container image
3. Fix problems and rerun steps 1 & 2 as necessary.
4. test the containers -- discussed later

The current build process is quite straightforward--it's just building a docker 
container. I2 and I3 propose to add these changes:
1. prior to building the containers, create a lockfile
2. rebuild all containers when the lockfile changes, rather than just updating 
one

Change 2 shouldn't significantly burden developers unless the build process is 
error-prone. We should fix that if it is, and perhaps a nightly container build 
isn't a bad Jenkins job to configure so we can address these failures as they 
happen.

Change 1 is the one that needs to be thought through. Ordinarily, building a 
lockfile is running a single command on the target platform, and often this 
process takes about a minute. This would not be a significant burden. The 
tricky part is: we have multiple platforms in the CI (three, in fact: 
manylinux1_i686, manylinux1_x86_64, and manylinux_arm) and it's not possible to 
build the lockfile for a platform other than the one you're running on.

In thinking this through again, I realized that actually this is the same 
problem as the one we have with Windows: it's not possible to use a single 
lockfile for all platforms. If you are switching platforms, you are going to 
have to accept you may need to switch dependency versions.

So let's return to the ultimate goals of I2/I3 approach:
1. Ensure all features of TVM can actually be used simultaneously on a given 
platform.
2. Standardize dependencies so that the unittests don't fail for different 
reasons on different containers.
3. provide enough input to allow the `tlcpack` build process to place narrower 
constraints on `install_requires` included there, based on the versions used in 
CI

In light of the above, all of these goals can be achieved only within a single 
pip platform. So I'd propose a new way to install dependencies while addressing 
this, I4:

I4. A multi-part approach:
1. For each pip platform that has a ci- container built around it: produce a 
lockfile either by:
    1. `pip install -r` all TVM requirements and `pip freeze`
    2. `poetry lock`

    The lockfile becomes part of the artifacts created when the containers are 
built. When multiple containers are built for the same pip platform (e.g. 
ci-lint, ci-cpu, ci-gpu, ci-qemu, ci-wasm), pick one container which will build 
the lockfile. The rest will merely consume this lockfile when installing 
packages either as pip constraints file or poetry lockfile.

2. I would argue that between platforms, the version standardization we care 
about is effectively TVM direct Python dependencies to the minor version level. 
For instance, if we add Windows tests, we prefer to test against e.g. 
tensorflow 1.6 on both linux and Windows. To address this, when building 
containers, we would first build a pilot platform e.g. `manylinux1_x86_64`, 
scan the resulting lockfile for direct dependencies of TVM, and produce a loose 
constraint file for the remaining platforms. For example, if the lockfile 
contained an entry `tensorflow==1.6.8`, we may add `tensorflow>=1.6,<1.7` to 
the cross-platform lockfile.

   -> This may not always be possible (conda may not update fast enough for us, 
etc), so we will need to have an override file per-platform. This is complex, 
but it's likely that this is happening anyway in the release process now, and 
the reasons why different packages are being installed aren't documented 
anywhere.

### Checking in new containers

To check in the built containers, these things need to happen:
1. upload candidate containers to docker hub under your username
2. prepare a TVM PR which updates the Jenkinsfile to reference these test 
containers
3. have a TVM committer push the PR to 
[ci-docker-staging](https://github.com/apache/tvm/tree/ci-docker-staging).
4. await test results and iterate as needed.
5. push candidate containers to `tlcpack` docker hub account
6. amend the TVM PR to point at `tlcpack` images and submit it.

Here we're proposing that in step 2, we also include the per-platform lockfiles 
 produced from container building.

### `tlcpack` Dependencies

Finally: I4 proposes we produce a cross-platform constraints file. This isn't 
discussed as part of this RFC (it's the next one), but it would be nice to 
produce a `tlcpack` wheel which includes some loose version constraints (e.g. 
so that by depending on tensorflow, a user running `pip install tlcpack` would 
install tensorflow 1.x and not 2.x). The version constraints in the 
cross-platform constraints file are exactly those we should use in such a 
`tlcpack` wheel. 

Previously, the follow-on RFC had proposed to analyze the CI lockfile and 
produce the `tlcpack` dependencies. I4 just proposes to move that analysis 
earlier, and effectively means that the CI would be testing against the same 
constraints present in the `tlcpack` wheel. This seems like a better change to 
me.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/rfc-python-dependencies-in-tvm-ci-containers/9011/7)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/236310e0290ee593ac5dc3d79d1abe4c49dcf88bb76c52dca48debb17eb7a68f).

Reply via email to