Hi all,

I’ve been from time to time getting questions around Coin, and why we started 
developing our own CI system for Qt instead of using some available solution.

To understand it, we will probably need to go back a few years to the time when 
we started developing it. At that point in time, we had a Jenkins based CI 
system that was giving us quite a few problems. Amongst them were

* We had lots of stability issues with the system. Much later, we saw that some 
of those issues were problems in the networking and virtualisation layer, but 
we didn’t know this at that time.
* CI machines were constantly running, making it very hard to balance resource 
requirements and leading to rather bad hardware utilisation. 
* The long running CI machines could easily accumulate a lot of garbage (again 
leading to instabilities and hard to debug problems)
* We could only ever do one release at a time, as switching branches required 
us to switch the VM templates
* We couldn’t deal with modularised repositories in a decent way, so we always 
had to compile all dependencies from scratch leading to extremely long turn 
around times
* The branch configurations were managed by hand, sometimes leading to problems 
when creating new branches
* CI and packaging were disconnected, so we had to compile everything once 
again from scratch to create binary packages (with slightly different 
configurations). This often lead to build errors during packaging that weren’t 
visible in CI. In addition, this wasted a lot of time and resources. We had a 
minimum turn around time from a fix being staged on Gerrit until it ended up in 
the package of 48 hours.
* Lack of provisioned build/test VMs.
* Developers couldn’t access the build/test VMs themselves for debugging (now 
we can at least provide it to people working at TQtC)
* We didn’t have any tests for the CI system itself, making it difficult to 
change and maintain.
* There were probably other issues that I’ve forgotten about now…

So we did sit down some years ago trying to find out how to best solve those 
problems. We did look at a variety of different solutions that existed and 
whether they could solve the problems we were having. In the end we came to the 
conclusion that none of the solutions that existed at that point in time were 
what we really needed and wanted. 

That left us with the option of either implementing a lot of new functionality 
for an existing system or doing our own solution. We ended up going for our 
own, as we couldn’t see how to easily bring the existing solutions to where we 
wanted them to be.

That turned out to be a larger effort than we initially estimated. 
Nevertheless, Coin is nowadays a much better system than what we had some years 
ago with Jenkins. Most of the problems mentioned above are solved today.

So where are we with Coin today? Let’s have a quick overview over what we have 
and maybe also the remaining issues we’re seeing:

The CI system contains several layers. As the basis, we have a cluster of 
rather powerful blades and a large set of Macs. Those are running inside a 
separate DMZ inside the Qt Company’s network. Each of those blades runs Linux 
with KVM as the hypervisor. The whole cluster is administered through 
OpenNebula/MAAS.

Coin itself runs on a separate powerful machine and brings up VMs as needed 
through OpenNebula. It listens to staging requests from Gerrit and contains all 
the logic to determine how and on which platforms to test a set of changes (the 
list of platforms and how they are provisioned is stored in qt5.git). In 
addition, it has a large storage area where we cache generated binary artifacts 
for the different repositories/branches/sha1s. Those are being used to test 
changes in dependent repositories, so we don’t need to compile qtbase every 
time. In addition, the binary artefacts are also being used for the creation of 
our binary packages.

But as you know, it’s certainly not perfect, and we have our regular share of 
bugs and problems with the system. Coin itself is actually running pretty 
nicely and doesn’t generate too many problems. We have decent control over it, 
and most bugs that we notice in the Coin codebase itself are not too hard to 
fix.

There are however a couple of other issues that are still creating problems for 
us:

* The network was causing lots of problems, we were seeing random packets being 
dropped and random disconnects of TCP connections. We have done some changes 
here last week, and are optimistic that this has now been fixed

* Windows 10 VMs are sometimes extremely slow when being run on top of our 
current host Linux/KVM combination. The root cause has still not been fully 
identified, but we are currently working on upgrading the host OS to a newer 
Ubuntu version. Judging from similar bug reports by others, there’s a good 
chance that this will resolve the problem.

* Flaky tests are a recurring problems. We’ve spend a lot of time trying to 
identify them and fix things where tests are relying on specific timing or 
other non-deterministic behaviour. A second source of flakiness comes from the 
underlying system, something we hope will be resolved with fixes for the two 
points above. Another issue is maintenance of the VMs and the fact that we have 
to be careful that those machines don’t start doing heavy work (such as auto 
updates etc) on their own.

* We still have some issues in the interaction between Coin and OpenNebula, 
where Coin fails to acquire machines and Tier2 images getting corrupted. This 
is being worked on by the CI team.

We are now moving Coin’s SW development is moving towards being able to build, 
test and package not only Qt, but also the other products we have (Qt Creator, 
3D Studio, Design Studio, Automotive, etc.). This will also make it a lot 
easier to have additional frameworks that are not part of qt5.git to be tested 
and packaged in Coin.

We should continue to evaluate alternatives from time to time, but currently 
Coin is the best option we have for our CI.

Cheers,
Lars

_______________________________________________
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to