Hi all, A few people have noted that Mesa's GitLab CI is just too slow, and not usable in day-to-day development, which is a massive shame.
I looked into it a bit this morning, and also discussed it with Emil, though nothing in this is speaking for him. Taking one of the last runs as representative (nothing in it looks like an outlier to me, and 7min to build RadeonSI seems entirely reasonable): https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds This run executed 24 jobs, which is beyond the limit of our CI parallelism. As documented on https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took 177 minutes of execution time, taking 120 minutes for the end-to-end pipeline. 177 minutes of runtime is too long for the runners we have now: if it perfectly occupies all our runners it will take over 12 minutes, which means that even if no-one else was using the runners, they could execute 5 Mesa builds per hour at full occupancy. Unfortunately, VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer, NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably have something to say about that. When the runners aren't occupied and there's less contention for jobs, it looks quite good: https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds This run 'only' took 20.5 minutes to execute, but then again, 3 pipelines per hour isn't that great either. Two hours of end-to-end pipeline time is also obviously far too long. Amongst other things, it practically precludes pre-merge CI: by the time your build has finished, someone will have pushed to the tree, so you need to start again. Even if we serialised it through a bot, that would limit us to pushing 12 changesets per day, which seems too low. I'm currently talking to two different hosts to try to get more sponsored time for CI runners. Those are both on hold this week due to travel / personal circumstances, but I'll hopefully find out more next week. Eric E filed an issue (https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to enable ccache cache but I don't see myself having the time to do it before next month. In the meantime, it would be great to see how we could reduce the number of jobs Mesa runs for each pipeline. Given we're already exceeding the limits of parallelism, having so many independent jobs isn't reducing the end-to-end pipeline time, but instead just duplicating effort required to fetch and check out sources, cache (in the future), start the container, run meson or ./configure, and build any common files. I'm taking it as a given that at least three separate builds are required: autotools, Meson, and SCons. Fair enough. It's been suggested to me that SWR should remain separate, as it takes longer to build than the other drivers, and getting fast feedback is important, which is fair enough. Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will already provide fast feedback on if we've broken the SCons build, and the rest is pretty uninteresting, so merging scons-swr into scons-llvm might help cut down on duplication. Suggestion #2: merge the misc Gallium jobs together. Building gallium-radeonsi and gallium-st-other are both relatively quick. We could merge these into gallium-drivers-other for a very small increase in overall runtime for that job, and save ourselves probably about 10% of the overall build time here. Suggestion #3: don't build so much LLVM in autotools. The Meson clover-llvm builds take half the time the autotools builds do. Perhaps we should only build one LLVM variant within autotools (to test the autotools LLVM selection still works), and then build all the rest only in Meson. That would be good for another 15-20% reduction in overall pipeline run time. Suggestion #4 (if necessary): build SWR less frequently. Can we perhaps demote SWR to an 'only:' job which will only rebuild SWR if SWR itself or Gallium have changed? This would save a good chunk of runtime - again close to 10%. Doing the above would reduce the run time fairly substantially, for what I can tell is no loss in functional coverage, and bring the parallelism to a mere 1.5x oversubscription of the whole organisation's available job slots, from the current 2x. Any thoughts? Cheers, Daniel _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev