[DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Claude Warren via dev
I have just posted a CEP  covering an Enhancement for Sparse Data 
Serialzation.  This is in response to CASSANDRA-8959


I look forward to responses.




Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Josh McKenzie
Could you post a link to that? I don't see it on the wiki: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201

On Mon, Sep 5, 2022, at 4:57 AM, Claude Warren via dev wrote:
> I have just posted a CEP  covering an Enhancement for Sparse Data 
> Serialzation.  This is in response to CASSANDRA-8959
> 
> I look forward to responses.
> 
> 
> 


Re: [DISCUSS] CEP-21: Transactional Cluster Metadata

2022-09-05 Thread Henrik Ingo
Mostly I just wanted to ack that at least someone read the doc (somewhat
superficially sure, but some parts with thought...)

One pre-feature that we would include in the preceding minor release is a
> node level switch to disable all operations that modify cluster metadata
> state. This would include schema changes as well as topology-altering
> events like move, decommission or (gossip-based) bootstrap and would be
> activated on all nodes for the duration of the *major* upgrade. If this
> switch were accessible via internode messaging, activating it for an
> upgrade could be automated. When an upgraded node starts up, it could send
> a request to disable metadata changes to any peer still running the old
> version. This would cost a few redundant messages, but simplify things
> operationally.
>
> Although this approach would necessitate an additional minor version
> upgrade, this is not without precedent and we believe that the benefits
> outweigh the costs of additional operational overhead.
>

Sounds like a great idea, and probably necessary in practice?


> If this part of the proposal is accepted, we could also include further
> messaging protocol changes in the minor release, as these would largely
> constitute additional verbs which would be implemented with no-op verb
> handlers initially. This would simplify the major version code, as it would
> not need to gate the sending of asynchronous replication messages on the
> receiver's release version. During the migration, it may be useful to have
> a way to directly inject gossip messages into the cluster, in case the
> states of the yet-to-be upgraded nodes become inconsistent. This isn't
> intended, so such a tool may never be required, but we have seen that
> gossip propagation can be difficult to reason about at times.
>

Others will know the code better and I understand that adding new no-op
verbs can be considered safe... But instinctively a bit hesitant on this
one. Surely adding a few if statements to the upgraded version isn't that
big of a deal?

Also, it should make sense to minimize the dependencies from the previous
major version (without CEP-21) to the new major version (with CEP-21). If a
bug is found, it's much easier to fix code in the new major version than
the old and supposedly stable one.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.]   [image: Visit us on
Twitter.]   [image: Visit us on YouTube.]

  [image: Visit my LinkedIn profile.] 


Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Abe Ratnofsky
Looking at this link: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-23%3A++Enhancement+for+Sparse+Data+Serialization

Do you have any plans to include benchmarks in your test plan? It would be 
useful to include disk usage / read performance / write performance comparisons 
with the new encodings, particularly for sparse collections where a subset of 
data is selected out of a collection.

I do wonder whether this is CEP-worthy. The CEP says that the changes will not 
impact existing users, will be backwards compatible, and overall is an 
efficiency improvement. The CEP guidelines say a CEP is encouraged “for 
significant user-facing or changes that cut across multiple subsystems”. Any 
reason why a Jira isn’t sufficient?

Abe

> On Sep 5, 2022, at 1:57 AM, Claude Warren via dev  
> wrote:
> 
> I have just posted a CEP  covering an Enhancement for Sparse Data 
> Serialzation.  This is in response to CASSANDRA-8959
> 
> I look forward to responses.
> 
> 



Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Claude Warren via dev
I am just learning the ropes here so perhaps it is not CEP worthy.  That 
being said, It felt like there was a lot of information to put into and 
track in a ticket, particularly when I expected discussion about how to 
best encode, changes to the algorithms etc.  It feels like it would be 
difficult to track. But if that is standard for this project I will move 
the information there.


As to the benchmarking, I had thought that usage and performance 
measures should be included.  Thank you for calling out the subset of 
data selected query as being of particular importance.


Claude

On 06/09/2022 03:11, Abe Ratnofsky wrote:

Looking at this link: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-23%3A++Enhancement+for+Sparse+Data+Serialization

Do you have any plans to include benchmarks in your test plan? It would be 
useful to include disk usage / read performance / write performance comparisons 
with the new encodings, particularly for sparse collections where a subset of 
data is selected out of a collection.

I do wonder whether this is CEP-worthy. The CEP says that the changes will not 
impact existing users, will be backwards compatible, and overall is an 
efficiency improvement. The CEP guidelines say a CEP is encouraged “for 
significant user-facing or changes that cut across multiple subsystems”. Any 
reason why a Jira isn’t sufficient?

Abe


On Sep 5, 2022, at 1:57 AM, Claude Warren via dev  
wrote:

I have just posted a CEP  covering an Enhancement for Sparse Data Serialzation. 
 This is in response to CASSANDRA-8959

I look forward to responses.