Hi dev@,

Mick asked that I check in w/ the dev list about CASSANDRA-15393. There's some 
concern regarding the patch and it's suitability for inclusion in 4.0-beta.

CASSANDRA-15393 reduces garbage created by compaction and the read paths by 
about 25%. It's part of CASSANDRA-15387, which, including this patch, reduces 
garbage from the read and compaction paths by about 50%. CASSANDRA-15393 does 
this by supporting byte array backed cell and clustering types, which is 
acheived by abstracting the backing type (ByteBuffer/byte[]) from the 
serialization logic. 

To avoid paying the allocation cost of adding a container object, singleton 
"accessor" objects are used to operate on the actual data. See here for an 
example: https://gist.github.com/bdeggleston/52910225b817a8d54353125ca03f521d

Mick and Robert Stupp have raised a few concerns, summarized below:

1. The patch is large (208 files / ~3.5k LOC)
2. Concerns about impact on stability
3. Parameterizing cell/clustering value types in this way makes 
ClassCastExceptions possible.
4. implications of feature freeze

The patch is large, but the vast majority of it is adding type parameters to 
things. The changes here are wide, but not deep. The most complex parts are the 
collection serializers and other places where we're now having to do offset 
bookkeeping. These should be carefully reviewed, but they shouldn't be too 
difficult to verify and I've added some randomized tests to check them against 
a wide range of schemas. I'll also run some diff tests against clusters 
internally.

Parameterizing cell and clustering values does make ClassCastExceptions 
possible, but java's type system guards against this for the most part. 
Regarding the feature freeze, I don't think it applies to performance 
improvements.

Back to the point about stability though: in pracice, compaction gc is a major 
contributor to cluster instability. In my experience, about 30% of availability 
issues are gc related. Also, compaction gc tends to be the limiting factor for 
repair, host replacements, and other topology changes, which limits how quickly 
you can recover from other issues. So the patch does add some risk, but I think 
it's a net win for stability.

Thoughts?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to