Replying here to some comments on IRC, since I'm rarely online at the same time as the others, but I don't want to let all the comments go unanswered...
> steveire> [06:32:44] CI is seriously depresssing. For the last 24 hours > there has been one successful merge. Many of the others are failing > because of something in network. > richmoore1> [06:40:03] steveire: a lot of this seems to be caused by the > moving of the CI infrstructure to digia. it doesn't seem to be working > fully yet I don't think that's true. As far as I know, the projects migrated to Digia have been working fine, and only the Nokia system (which has lost the majority of its support staff) has been having problems. The Nokia qt-test-server kernel recently started to produce "kernel: [14564774.569761] swapper: page allocation failure. order:4, mode:0x4020" after 377 days uptime with few problems. I don't think this is directly related to any Nokia -> Digia transition activities, rather an unfortunate coincidence of timing. > ThorbjornTux> [06:54:20] steveire: there was a discussion not that long ago > in here ... I think that the conclusion were that tests that failed without > reason > was ok to be marked as insignificant .... (as you suggested). > ThorbjornTux> [06:54:47] steveire: the problem seems only to be if anybody > does it .. > steveire> [06:55:10] Exactly. There used to be people who did things like > that. True, there used to be Nokia employees reading every failure report and chasing up apparently unstable tests, either trying to fix the tests, or acknowledge them via bug reports and marking them insignificant. Those people are gone and the test results are likely to be less stable until they're replaced - either by more people doing the same job, or an automated solution to achieve stable test results from an unstable product. > jpnurmi> [07:30:23] steveire: np, those tests have been annoying me several > times :) > steveire> [07:31:01] Yes. But why did I get so much pushback on fixing it? > Something > for qt-project to think about. > sahumada> [07:31:35] because you are not fixing it .. you are hiding it :) > steveire> [07:32:23] I'm fixing the problem that nothing has any chance of > integrating. > With your attitude, insignificant_test and QSKIP would not exist or be needed. I think it's great to have more people actively doing something about failing tests, as long as they take responsibility for their actions. The alternative of, when you see a flaky test, waiting for "someone" to do something, is not going to work (any more?) It might be good to have some guidelines about the best ways to handle flaky tests, since there are several options. > torarne> [09:18:47] anyone got powers to put things into qt5.git > without the ci getting in the way? There's no built-in mechanism to bypass the system. We haven't needed one so far, we've always managed to handle problems as they arise. If it were an acceptable option to bypass the system when problems occur, it seems to me it would greatly reduce the incentive to fix the problems. > <steveire> [10:34:22] Right. Anyone who can do anything doesn't really > care. This is the kind of thing that should be fixable quickly The first part is false. I care, and I can do something - just not at the time you've reported the issue (although I was probably awake, I made a choice a while ago to minimize time spent fixing problems outside of normal working hours, because I felt it was burning me out). Actually, every CI failure which is not related to any of the changes under test slightly erodes my soul. I can guarantee I've been at least as frustrated as any users of the system, during its most unreliable times. The latter part is true and the problem was fixed quickly (for some subjective value of "quickly"), once it was known. > richmoore1> [10:36:01] doing CI from one side of the world to the other > is optimistic Yup, we used to have the Pulse server and all clients located together in Brisbane. The migration to Jenkins meant the server was moved to Europe. We have suffered a little bit from that. Luckily, this will soon be over; just a few more days and everything will be operating out of Europe. > steveire> [10:36:50] And yet, there's been no communication on the mailing > list about the network problems (affecting everyone staging anything), > despite the fact that it's been known since Monday at least. > <steveire> [10:39:19] The insignification should have been done on > Monday imo I didn't understand this part. There has been no known problem since Monday, this seems to be a false assumption. The specific network problem you're complaining about was reported to JIRA by you, last night at 9pm my time, and fixed by me within the first 30 minutes of my working day today. Reporting problems greatly increases the likelihood of a timely fix. You'll be able to get technical support within your own timezone once the transition to Digia completes. Please note that several days of instability doesn't imply several days of the same problem going unfixed. In the last few days I've also been debugging mysterious OOM conditions from the kernel on some Linux builders, metacity crashes caused by Qt autotests (which do not themselves fail but cause later tests to fail), and exacerbation of these conditions by test machines mysteriously failing to reboot themselves between builds. So, although it might look from your point of view that there have been several days of "generic instability" with no activity, in fact there are a few different things going on. > torarne> [10:42:41] what about blames, bisecting integration runs, incremental > builds, testing subsets of tests, single-patch integrations, paralysing > building > and testing, any work in those areas? Not that I'm aware of. Unfortunately in the Nokia times, the standing directive from management (at least for the last 1-2 years) was to spend the minimum time possible on the CI system. That's why it had virtually no feature improvements in that time. (The last notable feature added was to allow tests marked with CONFIG+=parallel_test to run in parallel). I hope Digia will put more resources toward improving the system rather than merely maintaining it as-is. ===== I know it's frustrating to have some tool blocking your work and not being able to do much about it. Maybe this is why discussions about the CI so often veer into toxic semi-rants and baseless assumptions. Please do try to make a conscious effort to avoid this, because it acts as a disincentive to work on the system. This kind of thing is probably one of the reasons why sysadmins tend to stay aloof from developers. _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development