> > *** The Butler (Build Lead) > > The introduction of Butler and the Build Lead was a wonderful > improvement to our CI efforts. It has brought a lot of hygiene in > listing out flakies as they happened. Noted that this has in-turn > increased the burden in getting our major releases out, but that's to > be seen as a one-off cost. >
New Failures from Build Lead Week 3. *** CASSANDRA-18156 – repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification - AssertionError: Node logs don't have an error message for the failed repair - hard regression - 3.0, 3.11, *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match what was written with serialize(out, 12) for verb PAXOS2_COMMIT_AND_PREPARE_RSP - serializer class org.apache.cassandra.net.Message$Serializer; expected 1077, actual 1079 - 4.1, trunk *** CASSANDRA-18158 – org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck - Cannot achieve consistency level ALL - 3.11, trunk *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair - AssertionError: null in MemtablePool$SubPool.released(MemtablePool.java:193) - 3.11, 4.0, 4.1, trunk *** CASSANDRA-18160 – cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space - Found orphaned index file in after CDC state not in former - 4.1, trunk *** CASSANDRA-18161 – org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame - AssertionFailedError in CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491) - 4.0, 4.1, trunk *** CASSANDRA-18162 – cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements - Inet address 127.0.0.3:7000 is not available: [Errno 98] Address already in use - 3.0, 3.11, 4.0, 4.1, trunk *** CASSANDRA-18163 – transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle - AssertionError Incoming stream entireSSTable - 4.0, 4.1, trunk While writing these up, some thoughts… - While Butler reports failures against multiple branches, there's no feedback/sync that the ticket needs its fixVersions updated when failures happen in other branches after the ticket is created. - In 4.0 onwards, a majority of the failures are timeouts (>900s), reinforcing that the current main problem we are facing in ci-cassandra.a.o is saturation/infra