Hi,
First, I just wanted to introduce myself to the MXNet community. I’m Joe and will be working with Chai and the AWS team to improve some issues around MXNet CI. One of our goals is to reduce the costs associated with running MXNet CI. The task I’m working on now is this issue: https://github.com/apache/incubator-mxnet/issues/17802 Proposal: Staggered Jenkins CI pipeline Based on data collected from Jenkins, around 55% of the time when the mxnet-validation CI build is triggered by a PR, either the sanity or unix-cpu builds fail. When either of these builds fail, it doesn’t make sense to run the rest of the pipelines and utilize all those resources if we’ve already identified a build or unit test failure. We are proposing changing the MXNet Jenkins CI pipeline by requiring the *sanity* and *unix-cpu* builds to complete and pass tests successfully before starting the other build pipelines (centos-cpu/gpu, unix-gpu, windows-cpu/gpu, etc.) Once the sanity builds successfully complete, the remaining build pipelines will be triggered and run in parallel (as they currently do.) The purpose of this change is to identify faulty code or compatibility issues early and prevent further execution of CI builds. This will increase the time required to test a PR, but will prevent unnecessary builds from running. Does anyone have any concerns with this change or suggestions? Thanks. Joe Evans [email protected]
