Cassandra project biweekly status update 2021-02-07

Joshua McKenzie Mon, 07 Feb 2022 10:23:49 -0800

This is the Special "We need to talk" edition. :)

Something interesting changed in the past two weeks - we had our first
couple of rotations of a Build Lead (
https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead).

And why do we need to talk? Well, Brandon and I created a lot of test
failure tickets. And by "A lot", I mean 42:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20reporter%20IN%20(jmckenzie%2C%20brandon.williams)%20AND%20created%20%3E%20-14d

If you take a look at what's going on in Butler, you'll see that for 3.0,
3.11, and 4.0 our test failure rates are either increasing or holding
steady with a current total of 82 test failures between these three
versions. If we assume that all of those failures are duplicates (generous
of us), that still leaves us with a consistent 27 test failures on each
branch. This number of test failures effectively leaves us holding our
noses and merging with current non-test fixing changes, slowly worsening an
already messy situation.

For what it's worth, if we include trunk things get differently murky. On
the plus side we only have 16 failures there today, whereas on the downside
that's "for today" and test runs on trunk can't seem to make up their mind,
ranging from a low of 10 failures to highs of 49 and 67.

So what can we do about this? Well, if we had only 15 active contributors
(undershooting to illustrate the point) and each of them took 2 test
failures each week for the next 2 weeks, that'd be enough to drive down
most if not all of the failures across 3.0, 3.11, and 4.0. It's important
that we keep a clean test board because when things like security CVE's or
data loss defects come along, we need to be able to cut a quick hotfix
release without worrying about whether we're introducing new regressions
into critical production systems running GA release lines. It's hard to
overstate how critical this is to us as a project.

So in short, the outstanding question we as a project haven't tackled yet
is: how are we going to resource fixing these tests now that we have them
wired up in butler and JIRA and have them identified?

[New contributors]
Did you know fixing failing tests is a great way to get to know the
Cassandra codebase? :) This is actually in all seriousness, not in jest due
to what's above. Tests can be tricky, interesting, and quite educational if
you're opening up an area of the codebase you haven't worked on before, and
you can always hit up @cassandra_mentors in the #cassandra-dev channel on
the ASF slack server here: https://the-asf.slack.com

For convenience, here's a link to a kanban board of the currently
identified failing test JIRA's that haven't been assigned yet:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252
64 of them! A veritable cornucopia of interesting work.

If test failures aren't your bag, we have 14 tickets unassigned that are
solid starter tickets on the 4.0.x line, and 14 on the 4.x line you could
tackle:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2162&quickFilter=2160

[Dev mailing list conversations]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:

The last two weeks have seen momentum pick up a bit on the dev list. Some
highlights to get engaged with:

The trie-memtable thread got some very interesting updating today; Branimir
attached some performance numbers from the implementation compared to our
current skip list:
https://lists.apache.org/thread/fdvf1wmxwnv5jod59jznbnql23nqosty

Ekaterina landed CASSANDRA-15234 to standardise our config and JVM
parameters. This is a significant achievement and a ton of work to get
across the line - congrats Ekaterina! Email thread here:
https://lists.apache.org/thread/qf4ctv1067hz5j0pm6wc75rr44kospk4
One thing to be aware of is around the follow-up Ekaterina sent to the list
about our test suite misbehaving:
https://issues.apache.org/jira/browse/CASSANDRA-17351.

The ever lurking zombie conversation about ant vs. maven vs. gradle has
risen from the dead again:
https://lists.apache.org/thread/jksl415lvfmrnh7z7xvy41v3d25twc5w. We've
never really put a bow on this in the past and traditionally the
conversation fizzles out; the outstanding request from several of us today
is a clear enumeration of pros vs. cons, value vs. cost for each of the
different build systems so we can either make a decision as a project on
this or agree to put it to rest and not revisit it for a set amount of
time. To whomever decides to take up that torch, know that there are hordes
of people ready to share their opinions about their favored build system
with you (not sure if this is encouraging or not =/).

SharanF opened up an interesting and much-needed (editorializing alert)
thread about non-Java-code contributing committers on the project here:
https://lists.apache.org/thread/mlqqxcmyz60fd8mzn66nslp5nxlnryld. The
overloading of the term to mean both "someone with commit bit who commits
code" and "someone who is committed to the project" is something we've
stumbled upon in the past; would love to hear what people think on the
topic. (Reference community.apache.org article on committers here:
https://community.apache.org/contributors/)

And last but not least, I'd like to call attention to the interesting
discussions going on around Storage Attached Indexes (SAI) and including OR
support in the initial CEP or not:
https://lists.apache.org/thread/50t6p19s4c05wo1s5j510l195t5n6s10

[Development velocity]
We've closed out 6 issues on the 4.0.x line:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175
Some highlights include some broad fixes to intermittent in-jvm dtest
failures, cleaning up a bit in the PasswordObfuscator, and some packaging
and documentation. All just the kind of minor polishing changes we love to
see on a low ordinal GA release.

We had 14 issues closed out on the 4.x line with some highlights being
significant increases in VIntCoding speed (C-15215), Standardizing config
and JVM parameters mentioned above (C-15234), the removal of Windows
specific classes to clean up some vestigial bits in the codebase
(bittersweet for me, that one: C-16956), and some general tidying around
old python versions and test fixes.

[CI status]
See above. It's rough, but it's nothing we can't fix if we put our minds to
it.

And that about covers it for today - thanks everyone for reading and for
all your contributions on the project!

~Josh

Cassandra project biweekly status update 2021-02-07

Reply via email to