Hi, On 09-04-2025 21:42, Jeremy Bícha wrote:
On Wed, Apr 9, 2025 at 1:33 PM Paul Gevers <elb...@debian.org> wrote:My personal standard (but with Release Team hat in mind) is that I file RC bugs about flakiness if a test fails more than about 1 out of 6 times (on a particular architecture if it's architecture specific).
Sorry, I didn't have the full context of the thread when I replied yesterday. I should have delayed sending the response until I did.
What about if the failures happen 100% on someone's AWS instances but are reliably passing on official Debian infrastructure?
That's not what I mean with flaky.For some background: in private communications that I've had with Santiago, he has been advocating the case to declare FTBFS that happen reliably on 1-cpu hosts as RC. From what I've seen so far in his reports, the 1-cpu case occasionally exposes bugs that are hidden otherwise on the official buildds and whatever the maintainer uses for their test builds (own machine, salsa, etc). Thus, using 1-cpu hosts is a valuable way to test. On the other hand, 1-cpu hosts are not what most developers (and users I assume) use, and also not what we use on the buildds. Hence I can also relate to maintainers that think the 1-cpu case is just odd. As a result, I have refused to back him up in filing the 1-cpu FTBFS type of bugs at RC level and I suggested to file these bugs at severity level important. I've told him however that I do expect maintainers to take reasonable (and hence maintainable) patches, which ideally should just go upstream of course. So I suggested he'd work on providing patches with the 1-cpu reports that he files, as the 1-cpu case is important to him. I've told Santiago multiple times I appreciate his QA rebuilds (including the 1-cpu ones) a lot.
So, back to this case. The original report (1057562) was filed at severity serious and didn't mention the 1-cpu case. Jeremy claimed flakiness and lowered severity, which was bumped later by a Release Manager with the request to avoid the flaky test (fix it or disable it) because it was seen on the buildds. Later on (after message 86) the severity discussion becomes more difficult, because of changes that probably lowered the chance of the bug on more-than-1-cpu hosts, the definition of flakiness and statistics on the buildds. My (Release Team member) position is the following. As mentioned earlier, flakiness in my book is a serious problem if the failure rate is above 1 out of 6 (roughly). It's an important problem if it occurs less. On its own, 1-cpu FTBFS are important issues but not serious. In this case, the FTBFS are due to a particular test, and luckily tests can be disabled during the build. The test fails reliably on the 1-cpu case, and when tested by Santiago on a 2-cpu system failed 8% of the cases. According to my limits above, that 8% is not RC, but because the FTBFS happens because of 1 test, I do ask the maintainers of gcr4 (and gcr) to disable that particular test during the build until the underlying problem has been fixed. The patch in message 121 is supposed to do exactly that.
Paul
OpenPGP_signature.asc
Description: OpenPGP digital signature