Re: [Mesa-dev] Mesa CI is too slow

Eric Engestrom Mon, 18 Feb 2019 10:59:00 -0800

On Monday, 2019-02-18 17:31:41 +0000, Daniel Stone wrote:
> Hi all,
> A few people have noted that Mesa's GitLab CI is just too slow, and
> not usable in day-to-day development, which is a massive shame.


Agreed :/

> 
> I looked into it a bit this morning, and also discussed it with Emil,
> though nothing in this is speaking for him.
> 
> Taking one of the last runs as representative (nothing in it looks
> like an outlier to me, and 7min to build RadeonSI seems entirely
> reasonable):
> https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds
> 
> This run executed 24 jobs, which is beyond the limit of our CI
> parallelism. As documented on
> https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent
> job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took
> 177 minutes of execution time, taking 120 minutes for the end-to-end
> pipeline.
> 
> 177 minutes of runtime is too long for the runners we have now: if it
> perfectly occupies all our runners it will take over 12 minutes, which
> means that even if no-one else was using the runners, they could
> execute 5 Mesa builds per hour at full occupancy. Unfortunately,
> VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer,
> NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably
> have something to say about that.
> 
> When the runners aren't occupied and there's less contention for jobs,
> it looks quite good:
> https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds
> 
> This run 'only' took 20.5 minutes to execute, but then again, 3
> pipelines per hour isn't that great either.
> 
> Two hours of end-to-end pipeline time is also obviously far too long.
> Amongst other things, it practically precludes pre-merge CI: by the
> time your build has finished, someone will have pushed to the tree, so
> you need to start again. Even if we serialised it through a bot, that
> would limit us to pushing 12 changesets per day, which seems too low.
> 
> I'm currently talking to two different hosts to try to get more
> sponsored time for CI runners. Those are both on hold this week due to
> travel / personal circumstances, but I'll hopefully find out more next
> week. Eric E filed an issue
> (https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
> enable ccache cache but I don't see myself having the time to do it
> before next month.

Just to chime in to this point, I also have an MR to enable ccache per
runner, which with our static runners setup is not much worse than the
shared cache:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/240

From my cursory testing, this should already cut the compilations by
80-90% :)

> 
> In the meantime, it would be great to see how we could reduce the
> number of jobs Mesa runs for each pipeline. Given we're already
> exceeding the limits of parallelism, having so many independent jobs
> isn't reducing the end-to-end pipeline time, but instead just
> duplicating effort required to fetch and check out sources, cache (in
> the future), start the container, run meson or ./configure, and build
> any common files.
> 
> I'm taking it as a given that at least three separate builds are
> required: autotools, Meson, and SCons. Fair enough.
> 
> It's been suggested to me that SWR should remain separate, as it takes
> longer to build than the other drivers, and getting fast feedback is
> important, which is fair enough.
> 
> Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will
> already provide fast feedback on if we've broken the SCons build, and
> the rest is pretty uninteresting, so merging scons-swr into scons-llvm
> might help cut down on duplication.
> 
> Suggestion #2: merge the misc Gallium jobs together. Building
> gallium-radeonsi and gallium-st-other are both relatively quick. We
> could merge these into gallium-drivers-other for a very small increase
> in overall runtime for that job, and save ourselves probably about 10%
> of the overall build time here.
> 
> Suggestion #3: don't build so much LLVM in autotools. The Meson
> clover-llvm builds take half the time the autotools builds do. Perhaps
> we should only build one LLVM variant within autotools (to test the
> autotools LLVM selection still works), and then build all the rest
> only in Meson. That would be good for another 15-20% reduction in
> overall pipeline run time.
> 
> Suggestion #4 (if necessary): build SWR less frequently. Can we
> perhaps demote SWR to an 'only:' job which will only rebuild SWR if
> SWR itself or Gallium have changed? This would save a good chunk of
> runtime - again close to 10%.
> 
> Doing the above would reduce the run time fairly substantially, for
> what I can tell is no loss in functional coverage, and bring the
> parallelism to a mere 1.5x oversubscription of the whole
> organisation's available job slots, from the current 2x.
> 
> Any thoughts?

Your suggestions all sound good, although I can't speak for #1 and #2.

#3 sounds good, I guess we can keep meson builds with the "oldest supported
llvm" and the "current llvm version", and only the "oldest supported"
for autotools?

I think suggestion #4 (tracking which files actually affect the build)
would be good for all of them, but quickly complicated to keep up to
date. I guess those that have trivial files to track should get that
treatment, and we can leave the complicated ones as is.

---

You've suggested reducing the amount that's built (ccache,
dropping/merging jobs) and making it more parallel (fewer jobs), but
there's another avenue to look at: run the CI less often.

In my opinion, the CI should run on every single commit. Since that's
not realistic, we need to decide what's essential.
From most to least important:

- master: everything that hits master needs to be build- and smoke-tested

- stable branches: we obviously don't want to break stable branches

- merge requests: the reason I wrote the CI was to automatically test MRs

- personal work on forks: it would be really useful to test things
  before sending out an MR, especially with the less-used build systems
  that we often forget to update, but this should be opt-in, not opt-out
  as it is right now.

Ideally, this means we add this to the .gitlab.yml:
  only:
    - master
    - merge_requests
    - ci/*

Until this morning, I thought `merge_requests` was an Enterprise Edition
only feature, which I why I didn't put it in, but it appears I was wrong,
see:
https://docs.gitlab.com/ce/ci/merge_request_pipelines/
(Thanks Caio for reading through the docs more carefully than I did! :)

I'll send an MR in a bit with the above. This will mean that master and
MRs get automatic CI, and pushes on forks don't (except the fork's
master), but one can push a `ci/*` branch to their own fork to run the
CI on it.

I think this should massively drop the use of the CI, but mostly remove
unwanted uses :)

Cheers,
  Eric
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa CI is too slow

Reply via email to