Hi,
I would add that CEP-20 DDM C17940 has made huge progress and nearing
completion, all praise and glory to Andres. Also TTL c14227 has made big
progress completing the first round of review and pending the sstable
format/feature flag switch only so far.
Regards
On 20/3/23 21:30, Miklosovic, Stefan wrote:
Thank you, Josh, for keeping writing these summaries.
________________________________________
From: Josh McKenzie <jmcken...@apache.org>
Sent: Monday, March 20, 2023 20:34
To: dev
Subject: Cassandra project status, 2023-03-20
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you recognize the sender and know the content is safe.
I did say monthly-ish. That goes both earlier and later.
We've had a lot of interesting topics come up on the dev list in the past few
weeks as well as movement on Accord, Transactional Metadata, and SAI, so let's
get to it.
The Cassandra Forward event took place on March 14th with a lot of interesting
talks and attendees (link: https://www.cassandrasummit.org/cassandra-forward).
You can watch recordings of the different talks on the site as well as sign up
for the Cassandra Summit that's been rescheduled to December 12-13th. Hope to
see you there!
[New Contributors Getting Started]
We have a lot of great starter tickets to get started with if you're interested in diving in on
the project - you can see the list on the kanban board w/quick filters here:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2454&quickFilter=2652&quickFilter=2162&quickFilter=2160.
Anything on this list should be assignee-free and up for grabs so feel free to take a crack at
them.
For assistance on getting set up or orientation working on the code, join
#cassandra-dev channel on https://the-asf.slack.com (reply to me on this email
if you need an invite for your account), and feel free to tag the
@cassandra_mentors alias with questions.
[Dev mailing list]
https://lists.apache.org/list?dev@cassandra.apache.org:dfr=2023-3-3|dto=2023-3-20:
Have 26 threads this time - a bit more manageable but surprisingly busy for 18
days.
We've had a lot of discussion about downgradability; Branimir originally
brought the topic back up here:
https://lists.apache.org/thread/tcp339k5ph8ql35wxr085to4qgp8tpg7, and that
thread was still kicking since the last status update. Attempting a bit of
editorializing, various points that were brought up and not contended:
- Try and not break sstable format compatibility with a change if it's
reasonable not to
- Users should be able to opt-in to major format upgrades and not have access
to new features until such time as they've opted in
- We should have an offline sstabledowngrade tool
- Nodes should be able to write older version sstables if configured to do so
(how many versions and where that code lives is somewhat unclear still)
- We need simple tests (upgrade tests backwards) to see what works and doesn't
work to know the scope of the problem
- Jacek created the epic https://issues.apache.org/jira/browse/CASSANDRA-18300
to track work on downgradability
There's a good bit we discussed on the thread not yet captured in JIRA;
assuming nobody has significant disagreement with the list above I may create
tickets for the things we haven't yet captured so we don't lose that context.
Also - if I missed something from that thread you brought up you want to see
captured as well, let me know and I'll take care of that.
Another thread that's seen a lot of traffic without yet concluding: "[DISCUSS] Next release
date" (https://lists.apache.org/thread/fncbr50xg1otw8xtpyn0b3ys02bfnwv1). It seems like we
were headed towards a "set a target release date, back up N weeks based on how long we think
it will take to validate that, and set that as our branch / freeze date" conclusion. Jeremiah
offered October with a potential September freeze if we believe ourselves capable of a 4 week
validation, and David asked some pointed questions about why 4.1 took so long to release and
whether we have enough testing to trust trunk today. If you have some thoughts on the topic, please
don't let the thread lie dormant; it's important we come to a consensus on this and agree on a
target to push for.
Stefan created and reminded us of CASSANDRA-18043, "remove deprecated
DateTieredCompactionStratety". It's been deprecated for years now so it's probably
time to go.
Speaking of deprecation, we've been discussing the role of the hadoop
integration code in the codebase (link:
https://lists.apache.org/thread/q34zsscctgn6kpwkflx03859y7nv3y5z). The general
consensus appears to be for deprecation in 4.x and removal in 5.0 given the
code is unmaintained and very, very old.
Stefan brought up the somewhat problematic case with NetworkTopologyStrategy where RF >
number of racks, since the strategy can place things in a way where you lose QUORUM if you
lose a rack (link: https://lists.apache.org/thread/dntymkm1b9xjs1bognf3w1lpf1mdrzos). The
consensus on that thread was that we should make NTS do the right thing going forward but
also preserve the ability to do things "the old way". See this JIRA for more
details: https://issues.apache.org/jira/browse/CASSANDRA-16203
Bowen Song raised the topic of potentially enhancing how we handle disk errors:
https://lists.apache.org/thread/gwyz9otgokqvmdrq85nw3ds5nyrhz8t3. Some
interesting ideas came up on the thread as well as questions about what we
could potentially do with the current state of the art vs. a future with
transactional metadata. No conclusions quite yet but the notion of having
replicas selectively reject token ranges for which they've gotten disk errors,
however there's some interesting challenging questions that have come up (how
do you surface the error rate to operators? Can you repair automatically N
number of times before failing? What's the frequency of the corruption and
locality of it, and do we respond differently based on different parameters?).
Curious to see where this thread goes.
Mick opened a [DISCUSS] thread concerning raising the MessagingService.minimum_version to
40 on trunk here: https://lists.apache.org/thread/1pcnth265xb3jyf832dlgtbgsnqvtdot with
the goal of cleaning up some code potentially and simplifying things. This falls under
the general blanket of "Post 4.x Clean up" which is being tracked by the epic
here: https://issues.apache.org/jira/browse/CASSANDRA-18306
Ekaterina's thread on Cassandra and JDK17 concluded:
https://lists.apache.org/thread/ps5lvp3nxpwzwnpf472v02qyl9kjqybh. The consensus:
1. we keep current access to JDK internals
2. we only add more after careful deliberation
3. we take steps to remove the areas where we're coupled with unsafe internals
when more formal supported APIs become available
Sam Tunnicliffe pushed a new branch, cep-21-tcm, to the ASF repo:
https://lists.apache.org/thread/qkwnhgq02cn12jon2h565kh2gpzp9rry. Progress on
this work and other ticket filing / working is planned to be worked under the
epic for Delivery of CEP-21: Transactional Cluster Metadata here:
https://issues.apache.org/jira/browse/CASSANDRA-18330
As Accord has some dependencies on CEP-21, Caleb proposed we pause merging
cep-15-accord to trunk and instead rebase it on cep-21-tcm:
https://lists.apache.org/thread/obyohrpvv6tpsgxp5z268wzbo316b42m. No major
engagement on that thread and it's close to a week.
A conversation came up about positional vs. flagged arguments and whether, if at all, we
should be willing to make changes to break compatibility with older versions of tools.
Initial feedback on this one is "Don't break the old way, feel free to add new
ways": https://lists.apache.org/thread/8xfv1k1ncc7g1q60w3183nmrn4xkrg9s. Full
disclosure: I'm one of 2 people to chime in there so if you have a different opinion
speak up! :)
JDK-17 is requiring us to update our chrocnicle-queue dependency for audit
logging and fql which is bringing along a variety of other dependencies:
https://lists.apache.org/thread/hp5tokf2gr20dp3w5mzssgm4xgt97wg6.
Mick kicked the hornet's nest by suggesting we drop support for m* sstable formats
here: https://lists.apache.org/thread/bxg30nol25oxf1hvpkrqobopxszrnor2. This ties
into the thread above about downgradeability and how seamless we make it for
operators to work with older versions of data on newer binaries. No real conclusions
from the thread yet but I'm optimistic about where this energy and focus leads
(specifically the "make upgrades and downgrades safer so operators' lives are
easier).
Branimir put up a PR for UCS (Unified Compaction Strategy, CEP-26:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy).
The email thread about this can be found here:
https://lists.apache.org/thread/l9wswkkj3lxz2mvg75fmn0krr8ys4qd1. Active
thread; Branimir reached out this morning on it. I can't help but notice
there's still no JIRA linked from that CEP... ;)
And there we have it! Busy few weeks with some good discussions going on.
[Checking in on CI]
https://butler.cassandra.apache.org/#/
How did we fare in Feb?
3.0: 11 -> 9
3.11: 19 -> 22 (dropped to 13 then spiked)
4.0: 3 -> 8
4.1: 3 -> 3
trunk: 7 -> 8
When you look at the graphs, it's pretty clear that the more we touch branches
the more unstable their CI is. Which is actually good news as that implies the
underlying CI infrastructure is at least _somewhat_ stable enough to produce
consistent results when the code isn't frequently changed. Low single digits on
4.1 is reassuring, and on trunk we have 20 tests flagged as flaky and 3 marked
as suspicious failures in Butler. Run 1496 looks like it was a bit flaky in
general; timeouts and inability to connect on local port to cluster in python
dtests indicating env issues.
[What's been closed out]
Here's a custom quick filter to give us an overview in the last 17 days
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2278
... quite a few tickets in this time frame compared to the prior entire month.
5.0: 39 issues(!):
- A large collection of accord work:
- Operations.migrateReadRequiredOperations fails due to concurrent access
when TransactionStatement is prepared (CASSANDRA-18337)
- Remove git hook for pre-push (CASSANDRA-18309)
- Fix AIOOBE and improve validation messages for transaction statements
(CASSANDRA-18302)
- Add support for prepared statements for accord transactions
(CASSANDRA-18299)
- Migrate accord away from JDK random to a new interface RandomSource
(CASSANDRA-18213)
- Refactor transaction state storage (CASSANDRA-18192)
- remove futures in favor of AsyncChain (CASSANDRA-18004)
- Initial Cassandra integration (CASSANDRA-17412)
- Add max_sstable_size and max_sstable_duration metrics virtual tables
(CASSANDRA-18333)
- Extend implicit allow filtering to clustering keys on virtual tables
(CASASNDRA-18331)
- Remove deprecated CQL functions dateof and unixtimestampof (CASSANDRA-18328)
- in-jvm dtest Cluster.close issue led to Simulator errors (CASSANDRA-18320)
- Add feature flag for dynamic data masking (CASSANDRA-18316)
- BufferPool incorrectly counts memoryInUse when putUnsedPortion is used
(CASSANDRA-18311)
- Test fixes (CASSANDRA-18308
- Fix some missing documentation that got lost during transition
(CASSANDRA-18303)
- Fix die disk failure policy not killing JVM (CASSANDRA-18294)
- Remove JAVA8_HOME and JAVA11_HOME from circle configs (CASSANDRA-18293)
- Ordering issue on application of schema version changes in local system table
(CASSANDRA-18291)
- Enhance output of nodetool tablestats (CASASNDRA-18283)
- More logging around CompactionManager operations (CASSANDRA-18268)
- Added circleci config files for J11+J17 (CASSANDRA-18247)
- Reduce memory allocations of calls to ByteBuffer.duplicate in
CBUtil.writeValue (CASSANDRA-18212)
- upgradesstables does not always upgrade tables in proper order
(CASASNDRA-18143)
- Upgrade maven-shade-plugin to fix shaded dtest JAR build (CASSANDRA-18136)
- Improve memtable allocator accounting when updating AtomicBTreePartition
(CASSANDRA-18125)
- Update CCM for JDK17 and revise JDK detection (CASSANDRA-18106)
- Allow ability to use user-defined functions as masking functions
(CASSANDRA-18071)
- Add a new SELECT_MASKED permission (CASSANDRA-18070)
- Add a new UNMASK permission (CASSANDRA-18069)
- Remove deprecated DateTieredCompactionStrategy (CASSANDRA-18043)
- Remove -l, -m, -h from circle generator and use only free or paid options
(CASSANDRA-18012)
- Change trunk from 4.2 to 5.0 (CASSANDRA-17973)
- Update Opcodes.ASM7 when JDK17 support is added (CASSANDRA-17971)
- Deprecate org.apache.cassandra.hadoop code (CASSANDRA-16984)
- Implement parsing schema provider for external SUT in Harry
- Implement concurrent quiescent checker in Harry (CASSANDRA-18315)
- Improve column subset testing capabilities in Harry (CASSANDRA-17603)
4.1.x: 1 issue:
- Fix CompactionStrategyManagerBoundaryReloadTest.testReload with TrieMemtables
(CASSANDRA-18144)
Unversioned: 3 issues:
- Bump snakeuyaml to 2.0 (CASSANDRA-18340)
- Fix Debian repository misconfiguration (CASSANDRA-18326)
- Fix release 4.0.8 availability in jfrog (CASSANDRA-18307)
Phew. So that's an awful lot for 18 days; I think this might be the longest
list of deltas I've summarized here yet.
Well played project, well played.
~Josh