Re: How Apache Cassandra handles flaky tests

dinesh.jo...@yahoo.com.INVALID Tue, 26 Feb 2019 16:56:31 -0800

+1 to everything Jeff said. As someone who has worked on flaky tests not just 
in Cassandra's context, I know it can be hard to deal with them. 
However, it's best to root cause them. I have found some flaky tests were 
genuine issues that needed fixing in Cassandra. Sometimes the flakiness is due 
to underpowered VMs running low on resources or in one case tests failed due to 
the kernel settings different between systems. Explore tuning the VM settings 
used for the test execution. I usually don't prefer adding retries but in some 
cases retries can be helpful. Rewriting the tests to reduce dependencies on 
external systems or using mocks is another useful method in reducing the 
flakiness. Try breaking up tests if they're too big. Finally deleting tests can 
also be a solution but use it sparingly. 
I am believe in the broken windows theory so it is critical that you spend time 
fixing them else everyone ignores them and attributes all failures to 
"flakiness" leading to real issues sneaking in.
Dinesh


    On Tuesday, February 26, 2019, 12:06:10 PM PST, Jeff Jirsa 
<jji...@gmail.com> wrote:  
 
 


> On Feb 26, 2019, at 8:26 AM, Stanislav Kozlovski 
> <stanislav_kozlov...@outlook.com> wrote:
> 
> Hey there Cassandra community,
> 
> I work on a fellow open-source project - Apache Kafka - and there we have 
> been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every 
> Pull Request and due to test flakiness, almost all of them turn out red with 
> 1 or 2 tests (completely unrelated to the change in the PR) failing. This has 
> resulted in committers ignoring them and merging the changes either way, or 
> in the worst case - rerunning the hour-long build until it becomes green.

I hope most committers wont commit unless the flakey rest is definitely not in 
the subsystem they touched. But yes, one of the motivations for speeding up 
tests (parallelized on a containerized hosted CI platform) was to cut down the 
time for (re-)running
 
> This test flakiness has also slowed down our releases significantly.
> 
> In general, I was just curious to understand if this is a problem that 
> Cassandra faces as well.

Yes


> Does your project have a lot of intermittently failing tests,

Sometimes more than others. There were a few big pushes to get green, though it 
naturally regresses a bit over time 

> do you have any active process of addressing such tests (during the initial 
> review, after realizing it is flaky, etc). Any pointers will be greatly 
> appreciated!

I don’t think we’ve solved this convincingly. Different large (corporate) 
contributors have done long one time passes, and that helped a ton, but I don’t 
think there are any silver bullets yet.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: How Apache Cassandra handles flaky tests

Reply via email to