I ought to clarify, we did actually have green CI modulo 3 flaky tests on our internal CI system. I've attached the test artefacts to CASSANDRA-18330 now[1][2]: 2 of the 3 failures are upgrade dtests, with 1 other python dtest failure noted. None of these were reproducible in a dev setup, so we suspected them to be environmental and intended to merge before returning to confirm that. The "known" failures that we mentioned in the email that started this thread were ones observed by Mick running the cep-21-tcm branch through Circle before merging.
As the CEP-21 changeset was approaching 88k LoC touching over 900 files, permanently rebasing as we tried to eradicate every flaky test was simply unrealistic, especially as other significant patches continued to land in trunk. With that in mind, we took the decision to merge so that we could focus on actually removing any remaining instability. [1] https://issues.apache.org/jira/secure/attachment/13064727/ci_summary.html [2] https://issues.apache.org/jira/secure/attachment/13064728/result_details.tar.gz > On 27 Nov 2023, at 10:28, Berenguer Blasi <berenguerbl...@gmail.com> wrote: > > Hi, > > I have written this email like 10 times before sending it and I can't manage > to avoid making it sound with a negative spin to it. So pardon my English or > poor choice of words in advance and try to read it in a positive way. > > It is really demotivating to me seeing things getting merged without green > CI. I had to go through an herculean effort and pain (at least to me) to keep > rebasing the TTL patch continuously (a huge one imo) when it would have been > altogether much easier to merge, post-fix and post-add downgradability along > the TCM merge lines. > > If this merge-post fix approach is a thing I would like it clarified so we > can all benefit from it and to avoid the big-patch rebase pain. > > Regards > > On 27/11/23 10:38, Jacek Lewandowski wrote: >> Hi, >> >> I'm happy to hear that the feature got merged. Though, I share Benjamin's >> worries about that being a bad precedent. >> >> I don't think it makes sense to do repeated runs in this particular case. >> Detecting flaky tests would not prove anything; they can be caused by this >> patch, but we would not know that for sure. We would have to have a similar >> build with the same tests repeated to compare. It would take time and >> resources, and in the end, we will have to fix those flaky tests regardless >> of whether they were caused by this change. IMO, it makes sense to do a >> repeated run of the new tests, though. Aside from that, we can also consider >> making it easier and more automated for the developer to determine whether a >> particular flakiness comes from a feature branch one wants to merge. >> >> thanks, >> Jacek >> >> >> pon., 27 lis 2023 o 10:15 Benjamin Lerer <ble...@apache.org >> <mailto:ble...@apache.org>> napisał(a): >>> Hi, >>> >>> I must admit that I have been surprised by this merge and this following >>> email. We had lengthy discussions recently and the final agreement was that >>> the requirement for a merge was a green CI. >>> I could understand that for some reasons as a community we could wish to >>> make some exceptions. In this present case there was no official discussion >>> to ask for an exception. >>> I believe that this merge creates a bad precedent where anybody can feel >>> entitled to merge without a green CI and disregard any previous community >>> agreement. >>> >>> Le sam. 25 nov. 2023 à 09:22, Mick Semb Wever <m...@apache.org >>> <mailto:m...@apache.org>> a écrit : >>>> >>>> Great work Sam, Alex & Marcus ! >>>> >>>> >>>>> There are about 15-20 flaky or failing tests in total, spread over >>>>> several test jobs[2] (i.e. single digit failures in a few of these). We >>>>> have filed JIRAs for the failures and are working on getting those fixed >>>>> as a top priority. CASSANDRA-19055[3] is the umbrella ticket for this >>>>> follow up work. >>>>> >>>>> There are also a number of improvements we will work on in the coming >>>>> weeks, we will file JIRAs for those early next week and add them as >>>>> subtasks to CASSANDRA-19055. >>>> >>>> >>>> Can we get these tests temporarily annotated as skipped while all the >>>> subtickets to 19055 are being worked on ? >>>> >>>> As we have seen from CASSANDRA-18166 and CASSANDRA-19034 there's a lot of >>>> overhead now on 5.0 tickets having to navigate around these failures in >>>> trunk CI runs. >>>> >>>> Also, we're still trying to figure out how to do repeated runs for a patch >>>> so big… (the list of touched tests was too long for circleci, i need to >>>> figure out what the limit is and chunk it into separate circleci configs) >>>> … and it probably makes sense to wait until most of 19055 is done (or >>>> tests are temporarily annotated as skipped). >>>> >>>>