+1 to everything Jeff said. As someone who has worked on flaky tests not just in Cassandra's context, I know it can be hard to deal with them. However, it's best to root cause them. I have found some flaky tests were genuine issues that needed fixing in Cassandra. Sometimes the flakiness is due to underpowered VMs running low on resources or in one case tests failed due to the kernel settings different between systems. Explore tuning the VM settings used for the test execution. I usually don't prefer adding retries but in some cases retries can be helpful. Rewriting the tests to reduce dependencies on external systems or using mocks is another useful method in reducing the flakiness. Try breaking up tests if they're too big. Finally deleting tests can also be a solution but use it sparingly. I am believe in the broken windows theory so it is critical that you spend time fixing them else everyone ignores them and attributes all failures to "flakiness" leading to real issues sneaking in. Dinesh
On Tuesday, February 26, 2019, 12:06:10 PM PST, Jeff Jirsa <jji...@gmail.com> wrote: > On Feb 26, 2019, at 8:26 AM, Stanislav Kozlovski > <stanislav_kozlov...@outlook.com> wrote: > > Hey there Cassandra community, > > I work on a fellow open-source project - Apache Kafka - and there we have > been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every > Pull Request and due to test flakiness, almost all of them turn out red with > 1 or 2 tests (completely unrelated to the change in the PR) failing. This has > resulted in committers ignoring them and merging the changes either way, or > in the worst case - rerunning the hour-long build until it becomes green. I hope most committers wont commit unless the flakey rest is definitely not in the subsystem they touched. But yes, one of the motivations for speeding up tests (parallelized on a containerized hosted CI platform) was to cut down the time for (re-)running > This test flakiness has also slowed down our releases significantly. > > In general, I was just curious to understand if this is a problem that > Cassandra faces as well. Yes > Does your project have a lot of intermittently failing tests, Sometimes more than others. There were a few big pushes to get green, though it naturally regresses a bit over time > do you have any active process of addressing such tests (during the initial > review, after realizing it is flaky, etc). Any pointers will be greatly > appreciated! I don’t think we’ve solved this convincingly. Different large (corporate) contributors have done long one time passes, and that helped a ton, but I don’t think there are any silver bullets yet. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org