Re: Capabilities
Benedict, I agree with you TCM might be overkill for capabilities. It’s truly something that’s fine to be eventually consistent. Riaks implementation used a local ETS table (ETS is built into Erlang - equivalent for us would a local only system table) and an efficient and reliable gossip protocol. The data was a simple CRDT basically (a map> basically of support features in preference order with the only operations being additions and reads). So i agree with you that we could be using TCM as a hammer for every nail here. But im also hestitant to introduce something new. Distributed tables, or a virtual table with some way to aggregate accross the cluster, would also work. In either case we would need a local cache (like Denylist). >From a requirements perspective reads need to be local (because they may be done in a hot path) but writes can be slow (typically only change on start up or during operator intervention). Jordan On Fri, Dec 20, 2024 at 01:53 Benedict wrote: > If you perform a read from a distributed table on startup you will find > the latest information. What catchup are you thinking of? I don’t think any > of the features we talked about need a log, only the latest information. > > We can (and should) probably introduce event listeners for distributed > tables, as this is also a really great feature, but I don’t think this > should be necessary here. > > Regarding disagreements: if you use LWTs then there are no consistency > issues to worry about. > > Again, I’m not opposed to using TCM, although I am a little worried TCM is > becoming our new hammer with everything a nail. It would be better IMO to > keep TCM scoped to essential functionality as it’s critical to correctness. > Perhaps we could extend its APIs to less critical services without > intertwining them with membership, schema and epoch handling. > > On 20 Dec 2024, at 09:43, Štefan Miklošovič > wrote: > > > > I find TCM way more comfortable to work with. The capability of log being > replayed on restart and catching up with everything else automatically is > god-sent. If we had that on "good old distributed tables", then is it not > true that we would need to take extra care of that, e.g. we would need to > repair it etc ... It might be the source of the discrepancies / > disagreements etc. TCM is just "maintenance-free" and _just works_. > > I think I was also investigating distributed tables but was just pulled > towards TCM naturally because of its goodies. > > On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: > >> TCM is a perfectly valid basis for this, but TCM is only really >> *necessary* to solve meta config problems where we can’t rely on the rest >> of the database working. Particularly placement issues, which is why schema >> and membership need to live there. >> >> It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> That said, it’s possible config might be better represented as part of >> the schema (and we already store some relevant config there) in which case >> it would live in TCM automatically. Migrating existing configs to a >> distributed setup will be fun however we do it though. >> >> Capabilities also feel naturally related to other membership information, >> so TCM might be the most suitable place, particularly for handling >> downgrades after capabilities have been enabled (if we ever expect to >> support turning off capabilities and then downgrading - which today we >> mostly don’t). >> >> On 20 Dec 2024, at 08:42, Štefan Miklošovič >> wrote: >> >> >> Jordan, >> >> I also think that having it on TCM would be ideal and we should explore >> this path first before doing anything custom. >> >> Regarding my idea about the guardrails in TCM, when I prototyped that and >> wanted to make it happen, there was a little bit of a pushback (1) (even >> though super reasonable one) that TCM is just too young at the moment and >> it would be desirable to go through some stabilisation period. >> >> Another idea was that we should not make just guardrails happen but the >> whole config should be in TCM. From what I put together, Sam / Alex does >> not seem to be opposed to this idea, rather the opposite, but having CEP >> about that is way more involved than having just guardrails there. I >> consider guardrails to be kind of special and I do not think that having >> all configurations in TCM (which guardrails are part of) is the absolute >> must in order to deliver that. I may start with guardrails CEP and you may >> explore Capabilities CEP on TCM too, if that makes sense? >> >> I just wanted to raise the point about the time this would be delivered. >> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too >> but was explained it is probably too soon, I guess you would experience >> something similar. >> >> Sam's comment is from May and maybe a lot has changed since in then and >> his comment is not applicable anymore. It wou
Re: Capabilities
One minor clarification: ETS is entirely in memory (unless you explicitly dump it to disk or use DETS) so the equivalence to a local system table is only partially accurate but I think the parallel is fine in the case of what I was describing. Jordan On Fri, Dec 20, 2024 at 09:07 Jordan West wrote: > Benedict, I agree with you TCM might be overkill for capabilities. It’s > truly something that’s fine to be eventually consistent. Riaks > implementation used a local ETS table (ETS is built into Erlang - > equivalent for us would a local only system table) and an efficient and > reliable gossip protocol. The data was a simple CRDT basically (a > map> basically of support features in preference order > with the only operations being additions and reads). > > So i agree with you that we could be using TCM as a hammer for every nail > here. But im also hestitant to introduce something new. Distributed tables, > or a virtual table with some way to aggregate accross the cluster, would > also work. In either case we would need a local cache (like Denylist). > > From a requirements perspective reads need to be local (because they may > be done in a hot path) but writes can be slow (typically only change on > start up or during operator intervention). > > Jordan > > > > On Fri, Dec 20, 2024 at 01:53 Benedict wrote: > >> If you perform a read from a distributed table on startup you will find >> the latest information. What catchup are you thinking of? I don’t think any >> of the features we talked about need a log, only the latest information. >> >> We can (and should) probably introduce event listeners for distributed >> tables, as this is also a really great feature, but I don’t think this >> should be necessary here. >> >> Regarding disagreements: if you use LWTs then there are no consistency >> issues to worry about. >> >> Again, I’m not opposed to using TCM, although I am a little worried TCM >> is becoming our new hammer with everything a nail. It would be better IMO >> to keep TCM scoped to essential functionality as it’s critical to >> correctness. Perhaps we could extend its APIs to less critical services >> without intertwining them with membership, schema and epoch handling. >> >> On 20 Dec 2024, at 09:43, Štefan Miklošovič >> wrote: >> >> >> >> I find TCM way more comfortable to work with. The capability of log being >> replayed on restart and catching up with everything else automatically is >> god-sent. If we had that on "good old distributed tables", then is it not >> true that we would need to take extra care of that, e.g. we would need to >> repair it etc ... It might be the source of the discrepancies / >> disagreements etc. TCM is just "maintenance-free" and _just works_. >> >> I think I was also investigating distributed tables but was just pulled >> towards TCM naturally because of its goodies. >> >> On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: >> >>> TCM is a perfectly valid basis for this, but TCM is only really >>> *necessary* to solve meta config problems where we can’t rely on the rest >>> of the database working. Particularly placement issues, which is why schema >>> and membership need to live there. >>> >>> It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> That said, it’s possible config might be better represented as part of >>> the schema (and we already store some relevant config there) in which case >>> it would live in TCM automatically. Migrating existing configs to a >>> distributed setup will be fun however we do it though. >>> >>> Capabilities also feel naturally related to other membership >>> information, so TCM might be the most suitable place, particularly for >>> handling downgrades after capabilities have been enabled (if we ever expect >>> to support turning off capabilities and then downgrading - which today we >>> mostly don’t). >>> >>> On 20 Dec 2024, at 08:42, Štefan Miklošovič >>> wrote: >>> >>> >>> Jordan, >>> >>> I also think that having it on TCM would be ideal and we should explore >>> this path first before doing anything custom. >>> >>> Regarding my idea about the guardrails in TCM, when I prototyped that >>> and wanted to make it happen, there was a little bit of a pushback (1) >>> (even though super reasonable one) that TCM is just too young at the moment >>> and it would be desirable to go through some stabilisation period. >>> >>> Another idea was that we should not make just guardrails happen but the >>> whole config should be in TCM. From what I put together, Sam / Alex does >>> not seem to be opposed to this idea, rather the opposite, but having CEP >>> about that is way more involved than having just guardrails there. I >>> consider guardrails to be kind of special and I do not think that having >>> all configurations in TCM (which guardrails are part of) is the absolute >>> must in order to deliver that. I may start with guardrails CEP and you may >>> explore Capabilitie
Re: Capabilities
Having a parallel and feature focused TCM log as you suggested seems perfectly reasonable to me. On Fri, Dec 20, 2024 at 11:33 AM Benedict wrote: > Guardrails are broadly the same as Auth which works this way, but with > less criticality. It’s fine if guardrails are updated slowly. > > But, again, TCM is a fine target for this. It would however be nice to > have an in-between capability though, TCM-lite if you will, for these > features. Perhaps even just a parallel TCM log. > > > > > On 20 Dec 2024, at 10:24, Štefan Miklošovič > wrote: > > > What do you mean by a distributed table? You mean these in > system_distributed keyspace? > > If so, imagine we introduce a table system_distributed.guardrails where > each row would hold what a guardrail would be set to, hence on guardrails > evaluation in runtime (and there are a bunch of them to consider), it would > read this table every single time? Basically performing a select query on > this and that guardrail to see what its value is? There are plenty of > places where guardrails are evaluated, would not this slow things down > considerably? > > So, if we do not want to do that, would we start to cache it? So table + > cache? Isn't this becoming just too complicated? > > With guardrails in TCM, if we commit a transformation from some node that > a guardrail xyz changed its state from false to true, this gets propagated > to every single node in some epoch eventually so there is no reason to read > them from any table. A node would apply this transformation to itself as it > digests new epochs it pulled from cms. > > The point about hammer and all things being nails resonates with me. I > agree we should be cautious about this to not "bastardize" TCM by using it > for something unnecessary, but on the other hand we should be open to > exploring what such an implementation would mean _in details_ (what we do > here) before ruling it out for good. > > I am all ears if you guys see how it should work differently, I am still > in the process of putting all parts of the puzzle together so please be so > nice to prove me wrong. > > Regards > > On Fri, Dec 20, 2024 at 10:53 AM Benedict wrote: > >> If you perform a read from a distributed table on startup you will find >> the latest information. What catchup are you thinking of? I don’t think any >> of the features we talked about need a log, only the latest information. >> >> We can (and should) probably introduce event listeners for distributed >> tables, as this is also a really great feature, but I don’t think this >> should be necessary here. >> >> Regarding disagreements: if you use LWTs then there are no consistency >> issues to worry about. >> >> Again, I’m not opposed to using TCM, although I am a little worried TCM >> is becoming our new hammer with everything a nail. It would be better IMO >> to keep TCM scoped to essential functionality as it’s critical to >> correctness. Perhaps we could extend its APIs to less critical services >> without intertwining them with membership, schema and epoch handling. >> >> On 20 Dec 2024, at 09:43, Štefan Miklošovič >> wrote: >> >> >> I find TCM way more comfortable to work with. The capability of log being >> replayed on restart and catching up with everything else automatically is >> god-sent. If we had that on "good old distributed tables", then is it not >> true that we would need to take extra care of that, e.g. we would need to >> repair it etc ... It might be the source of the discrepancies / >> disagreements etc. TCM is just "maintenance-free" and _just works_. >> >> I think I was also investigating distributed tables but was just pulled >> towards TCM naturally because of its goodies. >> >> On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: >> >>> TCM is a perfectly valid basis for this, but TCM is only really >>> *necessary* to solve meta config problems where we can’t rely on the rest >>> of the database working. Particularly placement issues, which is why schema >>> and membership need to live there. >>> >>> It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> That said, it’s possible config might be better represented as part of >>> the schema (and we already store some relevant config there) in which case >>> it would live in TCM automatically. Migrating existing configs to a >>> distributed setup will be fun however we do it though. >>> >>> Capabilities also feel naturally related to other membership >>> information, so TCM might be the most suitable place, particularly for >>> handling downgrades after capabilities have been enabled (if we ever expect >>> to support turning off capabilities and then downgrading - which today we >>> mostly don’t). >>> >>> On 20 Dec 2024, at 08:42, Štefan Miklošovič >>> wrote: >>> >>> >>> Jordan, >>> >>> I also think that having it on TCM would be ideal and we should explore >>> this path first before doing anything custom. >>> >>> Regarding my idea
Re: Cassandra 5 Upgrade - Storage Compatibility Modes
I think after a discussion on #cassandra-dev yesterday, we are going to remove the requirement for schema agreement to deliver hints, as suggested by Jeff Jirsa. Kind Regards, Brandon On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler wrote: > > Hi Brandon, > > I am not sure which part changes after CASSANDRA-20118, there is still the > system mismatch going to CASSANDRA_4 caused by the change in > system.compaction_history, and going to UPGRADING, this is caused by the 2 > different sstable formats, so nothing that CASSANDRA-20118 fixes. > > So while CASSANDRA-20118 improves things, it does not fix these specific > issues, unless I have missed something? > > > On 19 Dec 2024, at 12:17, Brandon Williams wrote: > > > > On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler wrote: > >> C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent > >> from C*4 node to C*5 nodes. > >> CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be > >> added or replaced. > >> UPGRADING-> NONE: No issues. > > > > I'll note this will change after CASSANDRA-20118 > > > >> Any thoughts on whether having SCM controlled by JMX/nodetool is a good > >> idea? > > > > I think it's a good idea but it's tricky. As I said on 20118, "An > > unfortunate consequence of our use of static initialization is that > > once started, there is no way to change storage compatibility modes" > > and all the columns are defined statically, so that will have to be > > overcome. > > > > Kind Regards, > > Brandon >
Re: Capabilities
> It should be possible to use distributed system tables just fine for capabilities, config and guardrails. I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination mechanisms (gossip/system tables). I would prefer that we take a more phased and incremental approach to introduce new TCM states. As a way to accomplish that, I have thought about introducing a new generic TCM state "In Maintenance", where schema or membership changes are "frozen/disallowed" while an external operation is taking place. This "external operation" could mean many things: - Upgrade - Downgrade - Migration - Capability Enablement/Disablement These could be sub-states of the "Maintenance" TCM state, that could be managed externally (via cache/gossip/system tables/sidecar). Once these sub-states are validated thouroughly and mature enough, we could "promote" them to top-level TCM states. In the end what really matters is that cluster and schema membership changes do not happen while a miscellaneous operation is taking place. Would this make sense as an initial way to integrate TCM with capabilities framework ? On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: > If you perform a read from a distributed table on startup you will find > the latest information. What catchup are you thinking of? I don’t think any > of the features we talked about need a log, only the latest information. > > We can (and should) probably introduce event listeners for distributed > tables, as this is also a really great feature, but I don’t think this > should be necessary here. > > Regarding disagreements: if you use LWTs then there are no consistency > issues to worry about. > > Again, I’m not opposed to using TCM, although I am a little worried TCM is > becoming our new hammer with everything a nail. It would be better IMO to > keep TCM scoped to essential functionality as it’s critical to correctness. > Perhaps we could extend its APIs to less critical services without > intertwining them with membership, schema and epoch handling. > > On 20 Dec 2024, at 09:43, Štefan Miklošovič > wrote: > > > I find TCM way more comfortable to work with. The capability of log being > replayed on restart and catching up with everything else automatically is > god-sent. If we had that on "good old distributed tables", then is it not > true that we would need to take extra care of that, e.g. we would need to > repair it etc ... It might be the source of the discrepancies / > disagreements etc. TCM is just "maintenance-free" and _just works_. > > I think I was also investigating distributed tables but was just pulled > towards TCM naturally because of its goodies. > > On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: > >> TCM is a perfectly valid basis for this, but TCM is only really >> *necessary* to solve meta config problems where we can’t rely on the rest >> of the database working. Particularly placement issues, which is why schema >> and membership need to live there. >> >> It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> That said, it’s possible config might be better represented as part of >> the schema (and we already store some relevant config there) in which case >> it would live in TCM automatically. Migrating existing configs to a >> distributed setup will be fun however we do it though. >> >> Capabilities also feel naturally related to other membership information, >> so TCM might be the most suitable place, particularly for handling >> downgrades after capabilities have been enabled (if we ever expect to >> support turning off capabilities and then downgrading - which today we >> mostly don’t). >> >> On 20 Dec 2024, at 08:42, Štefan Miklošovič >> wrote: >> >> >> Jordan, >> >> I also think that having it on TCM would be ideal and we should explore >> this path first before doing anything custom. >> >> Regarding my idea about the guardrails in TCM, when I prototyped that and >> wanted to make it happen, there was a little bit of a pushback (1) (even >> though super reasonable one) that TCM is just too young at the moment and >> it would be desirable to go through some stabilisation period. >> >> Another idea was that we should not make just guardrails happen but the >> whole config should be in TCM. From what I put together, Sam / Alex does >> not seem to be opposed to this idea, rather the opposite, but having CEP >> about that is way more involved than having just guardrails there. I >> consider guardrails to be kind of special and I do not think that having >> all configurations in TCM (which guardrails are part of) is the absolute >> must in order to deliver that. I may start with guardrails CEP and you may >> explore Capabilities CEP on TCM too, if that makes sense? >> >> I just wanted to raise the point about the time this would be delivered. >> If
Re: Capabilities
Jordan, I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom. Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period. Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sam / Alex does not seem to be opposed to this idea, rather the opposite, but having CEP about that is way more involved than having just guardrails there. I consider guardrails to be kind of special and I do not think that having all configurations in TCM (which guardrails are part of) is the absolute must in order to deliver that. I may start with guardrails CEP and you may explore Capabilities CEP on TCM too, if that makes sense? I just wanted to raise the point about the time this would be delivered. If Capabilities are built on TCM and I wanted to do Guardrails on TCM too but was explained it is probably too soon, I guess you would experience something similar. Sam's comment is from May and maybe a lot has changed since in then and his comment is not applicable anymore. It would be great to know if we could build on top of the current trunk already or we will wait until 5.1/6.0 is delivered. (1) https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 On Fri, Dec 20, 2024 at 2:17 AM Jordan West wrote: > Firstly, glad to see the support and enthusiasm here and in the recent > Slack discussion. I think there is enough for me to start drafting a CEP. > > Stefan, global configuration and capabilities do have some overlap but not > full overlap. For example, you may want to set globally that a cluster > enables feature X or control the threshold for a guardrail but you still > need to know if all nodes support feature X or have that guardrail, the > latter is what capabilities targets. I do think capabilities are a step > towards supporting global configuration and the work you described is > another step (that we could do after capabilities or in parallel with them > in mind). I am also supportive of exploring global configuration for the > reasons you mentioned. > > In terms of how capabilities get propagated across the cluster, I hadn't > put much thought into it yet past likely TCM since this will be a new > feature that lands after TCM. In Riak, we had gossip (but more mature than > C*s -- this was an area I contributed to a lot so very familiar) to > disseminate less critical information such as capabilities and a separate > layer that did TCM. Since we don't have this in C* I don't think we would > want to build a separate distribution channel for capabilities metadata > when we already have TCM in place. But I plan to explore this more as I > draft the CEP. > > Jordan > > On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič > wrote: > >> Hi Jordan, >> >> what would this look like from the implementation perspective? I was >> experimenting with transactional guardrails where an operator would control >> the content of a virtual table which would be backed by TCM so whatever >> guardrail we would change, this would be automatically and transparently >> propagated to every node in a cluster. The POC worked quite nicely. TCM is >> just a vehicle to commit a change which would spread around and all these >> settings would survive restarts. We would have the same configuration >> everywhere which is not currently the case because guardrails are >> configured per node and if not persisted to yaml, on restart their values >> would be forgotten. >> >> Guardrails are just an example, what is quite obvious is to expand this >> idea to the whole configuration in yaml. Of course, not all properties in >> yaml make sense to be the same cluster-wise (ip addresses etc ...), but the >> ones which do would be again set everywhere the same way. >> >> The approach I described above is that we make sure that the >> configuration is same everywhere, hence there can be no misunderstanding >> what features this or that node has, if we say that all nodes have to have >> a particular feature because we said so in TCM log so on restart / replay, >> a node with "catch up" with whatever features it is asked to turn on. >> >> Your approach seems to be that we distribute what all capabilities / >> features a cluster supports and that each individual node configures itself >> in some way or not to comply? >> >> Is there any intersection in these approaches? At first sight it seems >> somehow related. How is one different from another from your point of view? >> >> Regards >> >> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593 >> >> On Thu, Dec 19, 2024 at 12:00 AM Jordan West
Re: Capabilities
TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sam / Alex does not seem to be opposed to this idea, rather the opposite, but having CEP about that is way more involved than having just guardrails there. I consider guardrails to be kind of special and I do not think that having all configurations in TCM (which guardrails are part of) is the absolute must in order to deliver that. I may start with guardrails CEP and you may explore Capabilities CEP on TCM too, if that makes sense? I just wanted to raise the point about the time this would be delivered. If Capabilities are built on TCM and I wanted to do Guardrails on TCM too but was explained it is probably too soon, I guess you would experience something similar.Sam's comment is from May and maybe a lot has changed since in then and his comment is not applicable anymore. It would be great to know if we could build on top of the current trunk already or we will wait until 5.1/6.0 is delivered.(1) https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326On Fri, Dec 20, 2024 at 2:17 AM Jordan Westwrote:Firstly, glad to see the support and enthusiasm here and in the recent Slack discussion. I think there is enough for me to start drafting a CEP. Stefan, global configuration and capabilities do have some overlap but not full overlap. For example, you may want to set globally that a cluster enables feature X or control the threshold for a guardrail but you still need to know if all nodes support feature X or have that guardrail, the latter is what capabilities targets. I do think capabilities are a step towards supporting global configuration and the work you described is another step (that we could do after capabilities or in parallel with them in mind). I am also supportive of exploring global configuration for the reasons you mentioned. In terms of how capabilities get propagated across the cluster, I hadn't put much thought into it yet past likely TCM since this will be a new feature that lands after TCM. In Riak, we had gossip (but more mature than C*s -- this was an area I contributed to a lot so very familiar) to disseminate less critical information such as capabilities and a separate layer that did TCM. Since we don't have this in C* I don't think we would want to build a separate distribution channel for capabilities metadata when we already have TCM in place. But I plan to explore this more as I draft the CEP.JordanOn Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič wrote:Hi Jordan,what would this look like from the implementation perspective? I was experimenting with transactional guardrails where an operator would control the content of a virtual table which would be backed by TCM so whatever guardrail we would change, this would be automatically and transparently propagated to every node in a cluster. The POC worked quite nicely. TCM is just a vehicle to commit a change which would spread around and all these settings would survive restarts. We would have the same configuration everywhere which is not currently the case because guardrails are configured per node and if not persisted to yaml, on restart their values would be forgotten.Guardrails are just an example, what is quite obvious is to expand this idea to the whole configuration in yaml. Of course, not all properties in yaml make sense to be the same cluster-wise (ip addresses etc ...), but the ones which do
Re: Capabilities
I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies. On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: > TCM is a perfectly valid basis for this, but TCM is only really > *necessary* to solve meta config problems where we can’t rely on the rest > of the database working. Particularly placement issues, which is why schema > and membership need to live there. > > It should be possible to use distributed system tables just fine for > capabilities, config and guardrails. > > That said, it’s possible config might be better represented as part of the > schema (and we already store some relevant config there) in which case it > would live in TCM automatically. Migrating existing configs to a > distributed setup will be fun however we do it though. > > Capabilities also feel naturally related to other membership information, > so TCM might be the most suitable place, particularly for handling > downgrades after capabilities have been enabled (if we ever expect to > support turning off capabilities and then downgrading - which today we > mostly don’t). > > On 20 Dec 2024, at 08:42, Štefan Miklošovič > wrote: > > > Jordan, > > I also think that having it on TCM would be ideal and we should explore > this path first before doing anything custom. > > Regarding my idea about the guardrails in TCM, when I prototyped that and > wanted to make it happen, there was a little bit of a pushback (1) (even > though super reasonable one) that TCM is just too young at the moment and > it would be desirable to go through some stabilisation period. > > Another idea was that we should not make just guardrails happen but the > whole config should be in TCM. From what I put together, Sam / Alex does > not seem to be opposed to this idea, rather the opposite, but having CEP > about that is way more involved than having just guardrails there. I > consider guardrails to be kind of special and I do not think that having > all configurations in TCM (which guardrails are part of) is the absolute > must in order to deliver that. I may start with guardrails CEP and you may > explore Capabilities CEP on TCM too, if that makes sense? > > I just wanted to raise the point about the time this would be delivered. > If Capabilities are built on TCM and I wanted to do Guardrails on TCM too > but was explained it is probably too soon, I guess you would experience > something similar. > > Sam's comment is from May and maybe a lot has changed since in then and > his comment is not applicable anymore. It would be great to know if we > could build on top of the current trunk already or we will wait until > 5.1/6.0 is delivered. > > (1) > https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326 > > On Fri, Dec 20, 2024 at 2:17 AM Jordan West wrote: > >> Firstly, glad to see the support and enthusiasm here and in the recent >> Slack discussion. I think there is enough for me to start drafting a CEP. >> >> Stefan, global configuration and capabilities do have some overlap but >> not full overlap. For example, you may want to set globally that a cluster >> enables feature X or control the threshold for a guardrail but you still >> need to know if all nodes support feature X or have that guardrail, the >> latter is what capabilities targets. I do think capabilities are a step >> towards supporting global configuration and the work you described is >> another step (that we could do after capabilities or in parallel with them >> in mind). I am also supportive of exploring global configuration for the >> reasons you mentioned. >> >> In terms of how capabilities get propagated across the cluster, I hadn't >> put much thought into it yet past likely TCM since this will be a new >> feature that lands after TCM. In Riak, we had gossip (but more mature than >> C*s -- this was an area I contributed to a lot so very familiar) to >> disseminate less critical information such as capabilities and a separate >> layer that did TCM. Since we don't have this in C* I don't think we would >> want to build a separate distribution channel for capabilities metadata >> when we already have TCM in place. But I plan to explore this more as I >> draft the CEP. >> >> Jordan >> >> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič >> wrote: >> >>> Hi Jordan, >>> >>> what would this look like from the implementation perspective? I was >>> experimenting with transactional guardrails where
Re: Capabilities
If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedictwrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sam / Alex does not seem to be opposed to this idea, rather the opposite, but having CEP about that is way more involved than having just guardrails there. I consider guardrails to be kind of special and I do not think that having all configurations in TCM (which guardrails are part of) is the absolute must in order to deliver that. I may start with guardrails CEP and you may explore Capabilities CEP on TCM too, if that makes sense? I just wanted to raise the point about the time this would be delivered. If Capabilities are built on TCM and I wanted to do Guardrails on TCM too but was explained it is probably too soon, I guess you would experience something similar.Sam's comment is from May and maybe a lot has changed since in then and his comment is not applicable anymore. It would be great to know if we could build on top of the current trunk already or we will wait until 5.1/6.0 is delivered.(1) https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326On Fri, Dec 20, 2024 at 2:17 AM Jordan West wrote:Firstly, glad to see the support and enthusiasm here and in the recent Slack discussion. I think there is enough for me to start drafting a CEP. Stefan, global configuration and capabilities do have some overlap but not full overlap. For example, you may want to set globally that a cluster enables feature X or control the threshold for a guardrail but you still need to know if all nodes support feature X or have that guardrail, the latter is what capabilities targets. I do think capabilities are a step towards supporting global configuration and the work you described is another step (that we could do after capabilities or in parallel with them in mind). I am also supportive of exploring global configuration for the reasons you mentioned. In terms of how capabilities get propagated across the cluster, I had
Re: [DISCUSS] Index selection syntax for CASSANDRA-18112
So that would look something like... SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' : [, ] } On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe wrote: > You mean like to control the tokenization/analysis of query terms? > > On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan > wrote: > >> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we >> move into allowing analysis/tokenization on indexed items, then a more >> general WITH OPTIONS would be useful for that too. That would let us add >> any other new options to a SELECT without needing to modify the grammar >> further. >> >> -Jeremiah >> >> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe >> wrote: >> >>> Some of your are probably familiar with work in the DS fork to improve >>> the selection of indexes for SAI queries in >>> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 >>> . >>> >>> While I'm eagerly anticipating working on that in the new year, I'm also >>> wondering whether we think some simple CQL extensions to manually control >>> index selection would be helpful. Maxwell proposed this a while back >>> in CASSANDRA-18112, and I'd like to propose a syntax: >>> >>> >>> ex. Do not use the specified index during the query. >>> >>> SELECT ... FROM ... WHERE ... WITHOUT INDEX >>> >>> This could be helpful for intersection queries where one of the provided >>> clauses is not very selective and could simply be handled via >>> post-filtering. >>> >>> ex. Require the specified index to be used. >>> >>> SELECT ... FROM ... WHERE ... WITH INDEX >>> >>> This could be helpful in scenarios where multiple indexes exist on a >>> column and was the primary motivation for CASSANDRA-18112. >>> >>> Thoughts? >>> >>
Re: [DISCUSS] Index selection syntax for CASSANDRA-18112
You mean like to control the tokenization/analysis of query terms? On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan wrote: > Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we > move into allowing analysis/tokenization on indexed items, then a more > general WITH OPTIONS would be useful for that too. That would let us add > any other new options to a SELECT without needing to modify the grammar > further. > > -Jeremiah > > On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe > wrote: > >> Some of your are probably familiar with work in the DS fork to improve >> the selection of indexes for SAI queries in >> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 >> . >> >> While I'm eagerly anticipating working on that in the new year, I'm also >> wondering whether we think some simple CQL extensions to manually control >> index selection would be helpful. Maxwell proposed this a while back >> in CASSANDRA-18112, and I'd like to propose a syntax: >> >> >> ex. Do not use the specified index during the query. >> >> SELECT ... FROM ... WHERE ... WITHOUT INDEX >> >> This could be helpful for intersection queries where one of the provided >> clauses is not very selective and could simply be handled via >> post-filtering. >> >> ex. Require the specified index to be used. >> >> SELECT ... FROM ... WHERE ... WITH INDEX >> >> This could be helpful in scenarios where multiple indexes exist on a >> column and was the primary motivation for CASSANDRA-18112. >> >> Thoughts? >> >
Re: [DISCUSS] Index selection syntax for CASSANDRA-18112
WITH INDEX (or something equivalent) seems really useful. Less opinionated on the specific syntax, but I think there is a lot of value in the form of predictable, controllable performance, in giving developers more direct control over query execution, whether that's index selection or even lower-level decisions. If you've experienced the thrill of operating a database with a cost-based planner that abruptly selects a new, sub-optimal plan due to a change in statistics or configuration, you'll appreciate language features that yield some planning control back to you. It does increase the burden on the developer to understand how best to execute the query, but it makes their intent much more obvious, and easier to adjust as the system changes. -- Joel. On 12/20/2024 12:28 PM, Caleb Rackliffe wrote: Some of your are probably familiar with work in the DS fork to improve the selection of indexes for SAI queries in https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424. While I'm eagerly anticipating working on that in the new year, I'm also wondering whether we think some simple CQL extensions to manually control index selection would be helpful. Maxwell proposed this a while back in CASSANDRA-18112, and I'd like to propose a syntax: ex. Do not use the specified index during the query. SELECT ... FROM ... WHERE ... WITHOUT INDEX This could be helpful for intersection queries where one of the provided clauses is not very selective and could simply be handled via post-filtering. ex. Require the specified index to be used. SELECT ... FROM ... WHERE ... WITH INDEX This could be helpful in scenarios where multiple indexes exist on a column and was the primary motivation for CASSANDRA-18112. Thoughts?
Re: [DISCUSS] Index selection syntax for CASSANDRA-18112
> > On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe > wrote: > >> You mean like to control the tokenization/analysis of query terms? >> > Yes. Elastic for example lets you specify the query time analyzer in the query, over riding what is specified at the index level. https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html#specify-search-query-analyzer On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe wrote: > So that would look something like... > > SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' : > [, ] } > Yeah something like that would work. On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe wrote: > So that would look something like... > > SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' : > [, ] } > > On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe > wrote: > >> You mean like to control the tokenization/analysis of query terms? >> >> On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan < >> jeremiah.jor...@gmail.com> wrote: >> >>> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we >>> move into allowing analysis/tokenization on indexed items, then a more >>> general WITH OPTIONS would be useful for that too. That would let us add >>> any other new options to a SELECT without needing to modify the grammar >>> further. >>> >>> -Jeremiah >>> >>> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe >>> wrote: >>> Some of your are probably familiar with work in the DS fork to improve the selection of indexes for SAI queries in https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 . While I'm eagerly anticipating working on that in the new year, I'm also wondering whether we think some simple CQL extensions to manually control index selection would be helpful. Maxwell proposed this a while back in CASSANDRA-18112, and I'd like to propose a syntax: ex. Do not use the specified index during the query. SELECT ... FROM ... WHERE ... WITHOUT INDEX This could be helpful for intersection queries where one of the provided clauses is not very selective and could simply be handled via post-filtering. ex. Require the specified index to be used. SELECT ... FROM ... WHERE ... WITH INDEX This could be helpful in scenarios where multiple indexes exist on a column and was the primary motivation for CASSANDRA-18112. Thoughts? >>>
Re: Capabilities
I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there. On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič wrote: > I am super hesitant to base distributed guardrails or any configuration > for that matter on anything but TCM. Does not "C" in TCM stand for > "configuration" anyway? So rename it to TSM like "schema" then if it is > meant to be just for that. It seems to be quite ridiculous to code tables > with caches on top when we have way more effective tooling thanks to CEP-21 > to deal with that with clear advantages of getting rid of all of that old > mechanism we have in place. > > I have not seen any concrete examples of risks why using TCM should be > just for what it is currently for. Why not put the configuration meant to > be cluster-wide into that? > > What is it ... performance? What does even the term "additional > complexity" mean? Complex in what? Do you think that putting there 3 types > of transformations in case of guardrails which flip some booleans and > numbers would suddenly make TCM way more complex? Come on ... > > This has nothing to do with what Jordan is trying to introduce. I think we > all agree he knows what he is doing and if he evaluates that TCM is too > much for his use case (or it is not a good fit) that is perfectly fine. > > On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote: > >> > It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> I have been thinking about this recently and I agree we should be wary >> about introducing new TCM states and create additional complexity that can >> be serviced by existing data dissemination mechanisms (gossip/system >> tables). I would prefer that we take a more phased and incremental approach >> to introduce new TCM states. >> >> As a way to accomplish that, I have thought about introducing a new >> generic TCM state "In Maintenance", where schema or membership changes are >> "frozen/disallowed" while an external operation is taking place. This >> "external operation" could mean many things: >> - Upgrade >> - Downgrade >> - Migration >> - Capability Enablement/Disablement >> >> These could be sub-states of the "Maintenance" TCM state, that could be >> managed externally (via cache/gossip/system tables/sidecar). Once these >> sub-states are validated thouroughly and mature enough, we could "promote" >> them to top-level TCM states. >> >> In the end what really matters is that cluster and schema membership >> changes do not happen while a miscellaneous operation is taking place. >> >> Would this make sense as an initial way to integrate TCM with >> capabilities framework ? >> >> On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: >> >>> If you perform a read from a distributed table on startup you will find >>> the latest information. What catchup are you thinking of? I don’t think any >>> of the features we talked about need a log, only the latest information. >>> >>> We can (and should) probably introduce event listeners for distributed >>> tables, as this is also a really great feature, but I don’t think this >>> should be necessary here. >>> >>> Regarding disagreements: if you use LWTs then there are no consistency >>> issues to worry about. >>> >>> Again, I’m not opposed to using TCM, although I am a little worried TCM >>> is becoming our new hammer with everything a nail. It would be better IMO >>> to keep TCM scoped to essential functionality as it’s critical to >>> correctness. Perhaps we could extend its APIs to less critical services >>> without intertwining them with membership, schema and epoch handling. >>> >>> On 20 Dec 2024, at 09:43, Štefan Miklošovič >>> wrote: >>> >>> >>> I find TCM way more comfortable to work with. The capability of log >>> being replayed on restart and catching up with everything else >>> automatically is god-sent. If we had that on "good old distributed tables", >>> then is it not true that we would need to take extra care of that, e.g. we >>> would need to repair it etc ... It might be the source of the discrepancies >>> / disagreements etc. TCM is just "maintenance-free" and _just works_. >>> >>> I think I was also investigating distributed tables but was just pulled >>> towards TCM naturally because of its goodies. >>> >>> On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: >>> TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there. It should be possible to use distributed system tables just fine for capabilities, config and guardrails. That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs t
Re: [DISCUSS] Index selection syntax for CASSANDRA-18112
Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}. If we move into allowing analysis/tokenization on indexed items, then a more general WITH OPTIONS would be useful for that too. That would let us add any other new options to a SELECT without needing to modify the grammar further. -Jeremiah On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe wrote: > Some of your are probably familiar with work in the DS fork to improve the > selection of indexes for SAI queries in > https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 > . > > While I'm eagerly anticipating working on that in the new year, I'm also > wondering whether we think some simple CQL extensions to manually control > index selection would be helpful. Maxwell proposed this a while back > in CASSANDRA-18112, and I'd like to propose a syntax: > > > ex. Do not use the specified index during the query. > > SELECT ... FROM ... WHERE ... WITHOUT INDEX > > This could be helpful for intersection queries where one of the provided > clauses is not very selective and could simply be handled via > post-filtering. > > ex. Require the specified index to be used. > > SELECT ... FROM ... WHERE ... WITH INDEX > > This could be helpful in scenarios where multiple indexes exist on a > column and was the primary motivation for CASSANDRA-18112. > > Thoughts? >
Re: Cassandra 5 Upgrade - Storage Compatibility Modes
Hi Brandon, That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t transmit the hints? Thanks Paul > On 20 Dec 2024, at 13:41, Brandon Williams wrote: > > I think after a discussion on #cassandra-dev yesterday, we are going > to remove the requirement for schema agreement to deliver hints, as > suggested by Jeff Jirsa. > > Kind Regards, > Brandon > > On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler wrote: >> >> Hi Brandon, >> >> I am not sure which part changes after CASSANDRA-20118, there is still the >> system mismatch going to CASSANDRA_4 caused by the change in >> system.compaction_history, and going to UPGRADING, this is caused by the 2 >> different sstable formats, so nothing that CASSANDRA-20118 fixes. >> >> So while CASSANDRA-20118 improves things, it does not fix these specific >> issues, unless I have missed something? >> >>> On 19 Dec 2024, at 12:17, Brandon Williams wrote: >>> >>> On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler wrote: C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent from C*4 node to C*5 nodes. CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be added or replaced. UPGRADING-> NONE: No issues. >>> >>> I'll note this will change after CASSANDRA-20118 >>> Any thoughts on whether having SCM controlled by JMX/nodetool is a good idea? >>> >>> I think it's a good idea but it's tricky. As I said on 20118, "An >>> unfortunate consequence of our use of static initialization is that >>> once started, there is no way to change storage compatibility modes" >>> and all the columns are defined statically, so that will have to be >>> overcome. >>> >>> Kind Regards, >>> Brandon >>
Re: Capabilities
What do you mean by a distributed table? You mean these in system_distributed keyspace? If so, imagine we introduce a table system_distributed.guardrails where each row would hold what a guardrail would be set to, hence on guardrails evaluation in runtime (and there are a bunch of them to consider), it would read this table every single time? Basically performing a select query on this and that guardrail to see what its value is? There are plenty of places where guardrails are evaluated, would not this slow things down considerably? So, if we do not want to do that, would we start to cache it? So table + cache? Isn't this becoming just too complicated? With guardrails in TCM, if we commit a transformation from some node that a guardrail xyz changed its state from false to true, this gets propagated to every single node in some epoch eventually so there is no reason to read them from any table. A node would apply this transformation to itself as it digests new epochs it pulled from cms. The point about hammer and all things being nails resonates with me. I agree we should be cautious about this to not "bastardize" TCM by using it for something unnecessary, but on the other hand we should be open to exploring what such an implementation would mean _in details_ (what we do here) before ruling it out for good. I am all ears if you guys see how it should work differently, I am still in the process of putting all parts of the puzzle together so please be so nice to prove me wrong. Regards On Fri, Dec 20, 2024 at 10:53 AM Benedict wrote: > If you perform a read from a distributed table on startup you will find > the latest information. What catchup are you thinking of? I don’t think any > of the features we talked about need a log, only the latest information. > > We can (and should) probably introduce event listeners for distributed > tables, as this is also a really great feature, but I don’t think this > should be necessary here. > > Regarding disagreements: if you use LWTs then there are no consistency > issues to worry about. > > Again, I’m not opposed to using TCM, although I am a little worried TCM is > becoming our new hammer with everything a nail. It would be better IMO to > keep TCM scoped to essential functionality as it’s critical to correctness. > Perhaps we could extend its APIs to less critical services without > intertwining them with membership, schema and epoch handling. > > On 20 Dec 2024, at 09:43, Štefan Miklošovič > wrote: > > > I find TCM way more comfortable to work with. The capability of log being > replayed on restart and catching up with everything else automatically is > god-sent. If we had that on "good old distributed tables", then is it not > true that we would need to take extra care of that, e.g. we would need to > repair it etc ... It might be the source of the discrepancies / > disagreements etc. TCM is just "maintenance-free" and _just works_. > > I think I was also investigating distributed tables but was just pulled > towards TCM naturally because of its goodies. > > On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: > >> TCM is a perfectly valid basis for this, but TCM is only really >> *necessary* to solve meta config problems where we can’t rely on the rest >> of the database working. Particularly placement issues, which is why schema >> and membership need to live there. >> >> It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> That said, it’s possible config might be better represented as part of >> the schema (and we already store some relevant config there) in which case >> it would live in TCM automatically. Migrating existing configs to a >> distributed setup will be fun however we do it though. >> >> Capabilities also feel naturally related to other membership information, >> so TCM might be the most suitable place, particularly for handling >> downgrades after capabilities have been enabled (if we ever expect to >> support turning off capabilities and then downgrading - which today we >> mostly don’t). >> >> On 20 Dec 2024, at 08:42, Štefan Miklošovič >> wrote: >> >> >> Jordan, >> >> I also think that having it on TCM would be ideal and we should explore >> this path first before doing anything custom. >> >> Regarding my idea about the guardrails in TCM, when I prototyped that and >> wanted to make it happen, there was a little bit of a pushback (1) (even >> though super reasonable one) that TCM is just too young at the moment and >> it would be desirable to go through some stabilisation period. >> >> Another idea was that we should not make just guardrails happen but the >> whole config should be in TCM. From what I put together, Sam / Alex does >> not seem to be opposed to this idea, rather the opposite, but having CEP >> about that is way more involved than having just guardrails there. I >> consider guardrails to be kind of special and I do not think that having >> all configurations
Re: Capabilities
Guardrails are broadly the same as Auth which works this way, but with less criticality. It’s fine if guardrails are updated slowly.But, again, TCM is a fine target for this. It would however be nice to have an in-between capability though, TCM-lite if you will, for these features. Perhaps even just a parallel TCM log.On 20 Dec 2024, at 10:24, Štefan Miklošovič wrote:What do you mean by a distributed table? You mean these in system_distributed keyspace?If so, imagine we introduce a table system_distributed.guardrails where each row would hold what a guardrail would be set to, hence on guardrails evaluation in runtime (and there are a bunch of them to consider), it would read this table every single time? Basically performing a select query on this and that guardrail to see what its value is? There are plenty of places where guardrails are evaluated, would not this slow things down considerably?So, if we do not want to do that, would we start to cache it? So table + cache? Isn't this becoming just too complicated?With guardrails in TCM, if we commit a transformation from some node that a guardrail xyz changed its state from false to true, this gets propagated to every single node in some epoch eventually so there is no reason to read them from any table. A node would apply this transformation to itself as it digests new epochs it pulled from cms.The point about hammer and all things being nails resonates with me. I agree we should be cautious about this to not "bastardize" TCM by using it for something unnecessary, but on the other hand we should be open to exploring what such an implementation would mean _in details_ (what we do here) before ruling it out for good.I am all ears if you guys see how it should work differently, I am still in the process of putting all parts of the puzzle together so please be so nice to prove me wrong. RegardsOn Fri, Dec 20, 2024 at 10:53 AM Benedictwrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sa
Re: Cassandra 5 Upgrade - Storage Compatibility Modes
That sounds like a possibility to me on the surface. Kind Regards, Brandon On Fri, Dec 20, 2024 at 8:42 AM Paul Chandler wrote: > > Hi Brandon, > > That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t > transmit the hints? > > Thanks > > Paul > > > On 20 Dec 2024, at 13:41, Brandon Williams wrote: > > > > I think after a discussion on #cassandra-dev yesterday, we are going > > to remove the requirement for schema agreement to deliver hints, as > > suggested by Jeff Jirsa. > > > > Kind Regards, > > Brandon > > > > On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler wrote: > >> > >> Hi Brandon, > >> > >> I am not sure which part changes after CASSANDRA-20118, there is still the > >> system mismatch going to CASSANDRA_4 caused by the change in > >> system.compaction_history, and going to UPGRADING, this is caused by the 2 > >> different sstable formats, so nothing that CASSANDRA-20118 fixes. > >> > >> So while CASSANDRA-20118 improves things, it does not fix these specific > >> issues, unless I have missed something? > >> > >>> On 19 Dec 2024, at 12:17, Brandon Williams wrote: > >>> > >>> On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler wrote: > C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent > from C*4 node to C*5 nodes. > CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be > added or replaced. > UPGRADING-> NONE: No issues. > >>> > >>> I'll note this will change after CASSANDRA-20118 > >>> > Any thoughts on whether having SCM controlled by JMX/nodetool is a good > idea? > >>> > >>> I think it's a good idea but it's tricky. As I said on 20118, "An > >>> unfortunate consequence of our use of static initialization is that > >>> once started, there is no way to change storage compatibility modes" > >>> and all the columns are defined statically, so that will have to be > >>> overcome. > >>> > >>> Kind Regards, > >>> Brandon > >> >
Re: Capabilities
Apologies I missed the forked thread "Re: Capabilities" before commenting on this. I think the TCM-lite suggestion there is not incompatible with the generic "In Maintenance" TCM state that I am proposing, since while in this state each individual feature could also have their independent/parallel TCM-lite log separated from the main cluster membership log. > I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. This is deviating from the thread, but would this not be handled by this: > "it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though." I have to admit I'm not familiar with the distributed guardrails proposal to comment on it. I am just expressing that a generic TCM state like "In Maintenance" could allow the TCM state machine to be "paused" while a miscellaneous operation not part of TCM is taking place. This allows more flexibility for operations to be externally managed without requiring that *everything uses TCM*, potentially introducing correctness and instability risks. This does not mean that distributed guardrails or capabilities cannot be integrated with TCM if it makes sense. On Fri, Dec 20, 2024 at 1:43 PM Štefan Miklošovič wrote: > I am super hesitant to base distributed guardrails or any configuration > for that matter on anything but TCM. Does not "C" in TCM stand for > "configuration" anyway? So rename it to TSM like "schema" then if it is > meant to be just for that. It seems to be quite ridiculous to code tables > with caches on top when we have way more effective tooling thanks to CEP-21 > to deal with that with clear advantages of getting rid of all of that old > mechanism we have in place. > > I have not seen any concrete examples of risks why using TCM should be > just for what it is currently for. Why not put the configuration meant to > be cluster-wide into that? > > What is it ... performance? What does even the term "additional > complexity" mean? Complex in what? Do you think that putting there 3 types > of transformations in case of guardrails which flip some booleans and > numbers would suddenly make TCM way more complex? Come on ... > > This has nothing to do with what Jordan is trying to introduce. I think we > all agree he knows what he is doing and if he evaluates that TCM is too > much for his use case (or it is not a good fit) that is perfectly fine. > > On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote: > >> > It should be possible to use distributed system tables just fine for >> capabilities, config and guardrails. >> >> I have been thinking about this recently and I agree we should be wary >> about introducing new TCM states and create additional complexity that can >> be serviced by existing data dissemination mechanisms (gossip/system >> tables). I would prefer that we take a more phased and incremental approach >> to introduce new TCM states. >> >> As a way to accomplish that, I have thought about introducing a new >> generic TCM state "In Maintenance", where schema or membership changes are >> "frozen/disallowed" while an external operation is taking place. This >> "external operation" could mean many things: >> - Upgrade >> - Downgrade >> - Migration >> - Capability Enablement/Disablement >> >> These could be sub-states of the "Maintenance" TCM state, that could be >> managed externally (via cache/gossip/system tables/sidecar). Once these >> sub-states are validated thouroughly and mature enough, we could "promote" >> them to top-level TCM states. >> >> In the end what really matters is that cluster and schema membership >> changes do not happen while a miscellaneous operation is taking place. >> >> Would this make sense as an initial way to integrate TCM with >> capabilities framework ? >> >> On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: >> >>> If you perform a read from a distributed table on startup you will find >>> the latest information. What catchup are you thinking of? I don’t think any >>> of the features we talked about need a log, only the latest information. >>> >>> We can (and should) probably introduce event listeners for distributed >>> tables, as this is also a really great feature, but I don’t think this >>> should be necessary here. >>> >>> Regarding disagreements: if you use LWTs then there are no consistency >>> issues to worry about. >>> >>> Again, I’m not opposed to using TCM, although I am a little worried TCM >>> is becoming our new hammer with everything a nail. It would be better IMO >>> to keep TCM scoped to essential functionality as it’s critical to >>> correctness. Perhaps we could extend its APIs to less critical services >>> without intertwining them with membership, schema and epoch handling. >>> >>> On 20 Dec 2024, at 09:43, Štefan Miklošovič >>> wrote: >>>
Re: Capabilities
On Fri, Dec 20, 2024 at 11:06 AM Benedict wrote: > If TCM breaks we all have a really bad time, much worse than if any one of > these features individually has problems. If you break TCM in the right way > the cluster could become inoperable, or operations like topology changes > may be prevented. > Benedict, when you say this are you speaking hypothetically (in the sense that by using TCM more we increase the probability of using it "wrong" and hitting an unknown edge case) or are there known ways today that TCM "breaks"? Jordan > This means that even a parallel log has some risk if we end up modifying > shared functionality. > > > > On 20 Dec 2024, at 18:47, Štefan Miklošovič > wrote: > > > I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super > reasonable to be put there. > > On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič > wrote: > >> I am super hesitant to base distributed guardrails or any configuration >> for that matter on anything but TCM. Does not "C" in TCM stand for >> "configuration" anyway? So rename it to TSM like "schema" then if it is >> meant to be just for that. It seems to be quite ridiculous to code tables >> with caches on top when we have way more effective tooling thanks to CEP-21 >> to deal with that with clear advantages of getting rid of all of that old >> mechanism we have in place. >> >> I have not seen any concrete examples of risks why using TCM should be >> just for what it is currently for. Why not put the configuration meant to >> be cluster-wide into that? >> >> What is it ... performance? What does even the term "additional >> complexity" mean? Complex in what? Do you think that putting there 3 types >> of transformations in case of guardrails which flip some booleans and >> numbers would suddenly make TCM way more complex? Come on ... >> >> This has nothing to do with what Jordan is trying to introduce. I think >> we all agree he knows what he is doing and if he evaluates that TCM is too >> much for his use case (or it is not a good fit) that is perfectly fine. >> >> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote: >> >>> > It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> I have been thinking about this recently and I agree we should be wary >>> about introducing new TCM states and create additional complexity that can >>> be serviced by existing data dissemination mechanisms (gossip/system >>> tables). I would prefer that we take a more phased and incremental approach >>> to introduce new TCM states. >>> >>> As a way to accomplish that, I have thought about introducing a new >>> generic TCM state "In Maintenance", where schema or membership changes are >>> "frozen/disallowed" while an external operation is taking place. This >>> "external operation" could mean many things: >>> - Upgrade >>> - Downgrade >>> - Migration >>> - Capability Enablement/Disablement >>> >>> These could be sub-states of the "Maintenance" TCM state, that could be >>> managed externally (via cache/gossip/system tables/sidecar). Once these >>> sub-states are validated thouroughly and mature enough, we could "promote" >>> them to top-level TCM states. >>> >>> In the end what really matters is that cluster and schema membership >>> changes do not happen while a miscellaneous operation is taking place. >>> >>> Would this make sense as an initial way to integrate TCM with >>> capabilities framework ? >>> >>> On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: >>> If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information. We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here. Regarding disagreements: if you use LWTs then there are no consistency issues to worry about. Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling. On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote: I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I
Re: Capabilities
I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on top when we have way more effective tooling thanks to CEP-21 to deal with that with clear advantages of getting rid of all of that old mechanism we have in place. I have not seen any concrete examples of risks why using TCM should be just for what it is currently for. Why not put the configuration meant to be cluster-wide into that? What is it ... performance? What does even the term "additional complexity" mean? Complex in what? Do you think that putting there 3 types of transformations in case of guardrails which flip some booleans and numbers would suddenly make TCM way more complex? Come on ... This has nothing to do with what Jordan is trying to introduce. I think we all agree he knows what he is doing and if he evaluates that TCM is too much for his use case (or it is not a good fit) that is perfectly fine. On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote: > > It should be possible to use distributed system tables just fine for > capabilities, config and guardrails. > > I have been thinking about this recently and I agree we should be wary > about introducing new TCM states and create additional complexity that can > be serviced by existing data dissemination mechanisms (gossip/system > tables). I would prefer that we take a more phased and incremental approach > to introduce new TCM states. > > As a way to accomplish that, I have thought about introducing a new > generic TCM state "In Maintenance", where schema or membership changes are > "frozen/disallowed" while an external operation is taking place. This > "external operation" could mean many things: > - Upgrade > - Downgrade > - Migration > - Capability Enablement/Disablement > > These could be sub-states of the "Maintenance" TCM state, that could be > managed externally (via cache/gossip/system tables/sidecar). Once these > sub-states are validated thouroughly and mature enough, we could "promote" > them to top-level TCM states. > > In the end what really matters is that cluster and schema membership > changes do not happen while a miscellaneous operation is taking place. > > Would this make sense as an initial way to integrate TCM with capabilities > framework ? > > On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: > >> If you perform a read from a distributed table on startup you will find >> the latest information. What catchup are you thinking of? I don’t think any >> of the features we talked about need a log, only the latest information. >> >> We can (and should) probably introduce event listeners for distributed >> tables, as this is also a really great feature, but I don’t think this >> should be necessary here. >> >> Regarding disagreements: if you use LWTs then there are no consistency >> issues to worry about. >> >> Again, I’m not opposed to using TCM, although I am a little worried TCM >> is becoming our new hammer with everything a nail. It would be better IMO >> to keep TCM scoped to essential functionality as it’s critical to >> correctness. Perhaps we could extend its APIs to less critical services >> without intertwining them with membership, schema and epoch handling. >> >> On 20 Dec 2024, at 09:43, Štefan Miklošovič >> wrote: >> >> >> I find TCM way more comfortable to work with. The capability of log being >> replayed on restart and catching up with everything else automatically is >> god-sent. If we had that on "good old distributed tables", then is it not >> true that we would need to take extra care of that, e.g. we would need to >> repair it etc ... It might be the source of the discrepancies / >> disagreements etc. TCM is just "maintenance-free" and _just works_. >> >> I think I was also investigating distributed tables but was just pulled >> towards TCM naturally because of its goodies. >> >> On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: >> >>> TCM is a perfectly valid basis for this, but TCM is only really >>> *necessary* to solve meta config problems where we can’t rely on the rest >>> of the database working. Particularly placement issues, which is why schema >>> and membership need to live there. >>> >>> It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> That said, it’s possible config might be better represented as part of >>> the schema (and we already store some relevant config there) in which case >>> it would live in TCM automatically. Migrating existing configs to a >>> distributed setup will be fun however we do it though. >>> >>> Capabilities also feel naturally related to other membership >>> information, so TCM might be the most suitable place, particularly for >>> handling downgrades after capabilities have been enabled (if we ever
Re: Capabilities
I don’t know the details and limits of TCM well enough to comment on what it can do, but i think its fair to say that if we can’t put a few hundred configuration options in taking up maybe a few MB, there’s a fundamental problem with it, and we need to seriously reconsider if it’s ready for production. Jon On Fri, Dec 20, 2024 at 10:47 AM Štefan Miklošovič wrote: > I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super > reasonable to be put there. > > On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič > wrote: > >> I am super hesitant to base distributed guardrails or any configuration >> for that matter on anything but TCM. Does not "C" in TCM stand for >> "configuration" anyway? So rename it to TSM like "schema" then if it is >> meant to be just for that. It seems to be quite ridiculous to code tables >> with caches on top when we have way more effective tooling thanks to CEP-21 >> to deal with that with clear advantages of getting rid of all of that old >> mechanism we have in place. >> >> I have not seen any concrete examples of risks why using TCM should be >> just for what it is currently for. Why not put the configuration meant to >> be cluster-wide into that? >> >> What is it ... performance? What does even the term "additional >> complexity" mean? Complex in what? Do you think that putting there 3 types >> of transformations in case of guardrails which flip some booleans and >> numbers would suddenly make TCM way more complex? Come on ... >> >> This has nothing to do with what Jordan is trying to introduce. I think >> we all agree he knows what he is doing and if he evaluates that TCM is too >> much for his use case (or it is not a good fit) that is perfectly fine. >> >> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote: >> >>> > It should be possible to use distributed system tables just fine for >>> capabilities, config and guardrails. >>> >>> I have been thinking about this recently and I agree we should be wary >>> about introducing new TCM states and create additional complexity that can >>> be serviced by existing data dissemination mechanisms (gossip/system >>> tables). I would prefer that we take a more phased and incremental approach >>> to introduce new TCM states. >>> >>> As a way to accomplish that, I have thought about introducing a new >>> generic TCM state "In Maintenance", where schema or membership changes are >>> "frozen/disallowed" while an external operation is taking place. This >>> "external operation" could mean many things: >>> - Upgrade >>> - Downgrade >>> - Migration >>> - Capability Enablement/Disablement >>> >>> These could be sub-states of the "Maintenance" TCM state, that could be >>> managed externally (via cache/gossip/system tables/sidecar). Once these >>> sub-states are validated thouroughly and mature enough, we could "promote" >>> them to top-level TCM states. >>> >>> In the end what really matters is that cluster and schema membership >>> changes do not happen while a miscellaneous operation is taking place. >>> >>> Would this make sense as an initial way to integrate TCM with >>> capabilities framework ? >>> >>> On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote: >>> If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information. We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here. Regarding disagreements: if you use LWTs then there are no consistency issues to worry about. Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling. On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote: I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies. On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote: > TCM is a perfectly valid basis for this, but TCM is only really > *necessary* to solve meta config problems where we can’t rely on the rest >>
Re: Capabilities
If TCM breaks we all have a really bad time, much worse than if any one of these features individually has problems. If you break TCM in the right way the cluster could become inoperable, or operations like topology changes may be prevented. So, we want to keep its responsibilities scoped sensibly, so we minimise the risk of changes to these features.This means that even a parallel log has some risk if we end up modifying shared functionality.On 20 Dec 2024, at 18:47, Štefan Miklošovič wrote:I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there.On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovičwrote:I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on top when we have way more effective tooling thanks to CEP-21 to deal with that with clear advantages of getting rid of all of that old mechanism we have in place.I have not seen any concrete examples of risks why using TCM should be just for what it is currently for. Why not put the configuration meant to be cluster-wide into that? What is it ... performance? What does even the term "additional complexity" mean? Complex in what? Do you think that putting there 3 types of transformations in case of guardrails which flip some booleans and numbers would suddenly make TCM way more complex? Come on ...This has nothing to do with what Jordan is trying to introduce. I think we all agree he knows what he is doing and if he evaluates that TCM is too much for his use case (or it is not a good fit) that is perfectly fine. On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote:> It should be possible to use distributed system tables just fine for capabilities, config and guardrails.I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination mechanisms (gossip/system tables). I would prefer that we take a more phased and incremental approach to introduce new TCM states.As a way to accomplish that, I have thought about introducing a new generic TCM state "In Maintenance", where schema or membership changes are "frozen/disallowed" while an external operation is taking place. This "external operation" could mean many things:- Upgrade- Downgrade- Migration- Capability Enablement/DisablementThese could be sub-states of the "Maintenance" TCM state, that could be managed externally (via cache/gossip/system tables/sidecar). Once these sub-states are validated thouroughly and mature enough, we could "promote" them to top-level TCM states.In the end what really matters is that cluster and schema membership changes do not happen while a miscellaneous operation is taking place.Would this make sense as an initial way to integrate TCM with capabilities framework ?On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fi
Re: Capabilities
Mostly conceptual; the problem with a linearizable history is that if you lose some of it (eg because some logic bug prevents you from processing some epoch) you stop the world until an operator can step in to perform surgery about what the history should be.I do know of one recent bug to schema changes in cep-15 that broke TCM in this way. That particular avenue will be hardened, but the fewer places we risk this the better IMO. Of course, there are steps we could take to expose a limited API targeting these use cases, as well as using a separate log for ancillary functionality, that might better balance risk:reward. But equally I’m not sure it makes sense to TCM all the things, and maybe dogfooding our own database features and developing functionality that enables our own use cases could be better where it isn’t necessary 🤷♀️On 20 Dec 2024, at 19:22, Jordan West wrote:On Fri, Dec 20, 2024 at 11:06 AM Benedictwrote:If TCM breaks we all have a really bad time, much worse than if any one of these features individually has problems. If you break TCM in the right way the cluster could become inoperable, or operations like topology changes may be prevented. Benedict, when you say this are you speaking hypothetically (in the sense that by using TCM more we increase the probability of using it "wrong" and hitting an unknown edge case) or are there known ways today that TCM "breaks"? Jordan This means that even a parallel log has some risk if we end up modifying shared functionality.On 20 Dec 2024, at 18:47, Štefan Miklošovič wrote:I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there.On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič wrote:I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on top when we have way more effective tooling thanks to CEP-21 to deal with that with clear advantages of getting rid of all of that old mechanism we have in place.I have not seen any concrete examples of risks why using TCM should be just for what it is currently for. Why not put the configuration meant to be cluster-wide into that? What is it ... performance? What does even the term "additional complexity" mean? Complex in what? Do you think that putting there 3 types of transformations in case of guardrails which flip some booleans and numbers would suddenly make TCM way more complex? Come on ...This has nothing to do with what Jordan is trying to introduce. I think we all agree he knows what he is doing and if he evaluates that TCM is too much for his use case (or it is not a good fit) that is perfectly fine. On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote:> It should be possible to use distributed system tables just fine for capabilities, config and guardrails.I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination mechanisms (gossip/system tables). I would prefer that we take a more phased and incremental approach to introduce new TCM states.As a way to accomplish that, I have thought about introducing a new generic TCM state "In Maintenance", where schema or membership changes are "frozen/disallowed" while an external operation is taking place. This "external operation" could mean many things:- Upgrade- Downgrade- Migration- Capability Enablement/DisablementThese could be sub-states of the "Maintenance" TCM state, that could be managed externally (via cache/gossip/system tables/sidecar). Once these sub-states are validated thouroughly and mature enough, we could "promote" them to top-level TCM states.In the end what really matters is that cluster and schema membership changes do not happen while a miscellaneous operation is taking place.Would this make sense as an initial way to integrate TCM with capabilities framework ?On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend
[DISCUSS] Index selection syntax for CASSANDRA-18112
Some of your are probably familiar with work in the DS fork to improve the selection of indexes for SAI queries in https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424 . While I'm eagerly anticipating working on that in the new year, I'm also wondering whether we think some simple CQL extensions to manually control index selection would be helpful. Maxwell proposed this a while back in CASSANDRA-18112, and I'd like to propose a syntax: ex. Do not use the specified index during the query. SELECT ... FROM ... WHERE ... WITHOUT INDEX This could be helpful for intersection queries where one of the provided clauses is not very selective and could simply be handled via post-filtering. ex. Require the specified index to be used. SELECT ... FROM ... WHERE ... WITH INDEX This could be helpful in scenarios where multiple indexes exist on a column and was the primary motivation for CASSANDRA-18112. Thoughts?