date:20241220

Re: Capabilities

2024-12-20 Thread Jordan West

Benedict, I agree with you TCM might be overkill for capabilities. It’s
truly something that’s fine to be eventually consistent. Riaks
implementation used a local ETS table (ETS is built into Erlang -
equivalent for us would a local only system table) and an efficient and
reliable gossip protocol. The data was a simple CRDT basically (a
map> basically of support features in preference order
with the only operations being additions and reads).

So i agree with you that we could be using TCM as a hammer for every nail
here. But im also hestitant to introduce something new. Distributed tables,
or a virtual table with some way to aggregate accross the cluster, would
also work. In either case we would need a local cache (like Denylist).

>From a requirements perspective reads need to be local (because they may be
done in a hot path) but writes can be slow (typically only change on start
up or during operator intervention).

Jordan



On Fri, Dec 20, 2024 at 01:53 Benedict  wrote:

> If you perform a read from a distributed table on startup you will find
> the latest information. What catchup are you thinking of? I don’t think any
> of the features we talked about need a log, only the latest information.
>
> We can (and should) probably introduce event listeners for distributed
> tables, as this is also a really great feature, but I don’t think this
> should be necessary here.
>
> Regarding disagreements: if you use LWTs then there are no consistency
> issues to worry about.
>
> Again, I’m not opposed to using TCM, although I am a little worried TCM is
> becoming our new hammer with everything a nail. It would be better IMO to
> keep TCM scoped to essential functionality as it’s critical to correctness.
> Perhaps we could extend its APIs to less critical services without
> intertwining them with membership, schema and epoch handling.
>
> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
> wrote:
>
> 
>
> I find TCM way more comfortable to work with. The capability of log being
> replayed on restart and catching up with everything else automatically is
> god-sent. If we had that on "good old distributed tables", then is it not
> true that we would need to take extra care of that, e.g. we would need to
> repair it etc ... It might be the source of the discrepancies /
> disagreements etc. TCM is just "maintenance-free" and _just works_.
>
> I think I was also investigating distributed tables but was just pulled
> towards TCM naturally because of its goodies.
>
> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>
>> TCM is a perfectly valid basis for this, but TCM is only really
>> *necessary* to solve meta config problems where we can’t rely on the rest
>> of the database working. Particularly placement issues, which is why schema
>> and membership need to live there.
>>
>> It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> That said, it’s possible config might be better represented as part of
>> the schema (and we already store some relevant config there) in which case
>> it would live in TCM automatically. Migrating existing configs to a
>> distributed setup will be fun however we do it though.
>>
>> Capabilities also feel naturally related to other membership information,
>> so TCM might be the most suitable place, particularly for handling
>> downgrades after capabilities have been enabled (if we ever expect to
>> support turning off capabilities and then downgrading - which today we
>> mostly don’t).
>>
>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Jordan,
>>
>> I also think that having it on TCM would be ideal and we should explore
>> this path first before doing anything custom.
>>
>> Regarding my idea about the guardrails in TCM, when I prototyped that and
>> wanted to make it happen, there was a little bit of a pushback (1) (even
>> though super reasonable one) that TCM is just too young at the moment and
>> it would be desirable to go through some stabilisation period.
>>
>> Another idea was that we should not make just guardrails happen but the
>> whole config should be in TCM. From what I put together, Sam / Alex does
>> not seem to be opposed to this idea, rather the opposite, but having CEP
>> about that is way more involved than having just guardrails there. I
>> consider guardrails to be kind of special and I do not think that having
>> all configurations in TCM (which guardrails are part of) is the absolute
>> must in order to deliver that. I may start with guardrails CEP and you may
>> explore Capabilities CEP on TCM too, if that makes sense?
>>
>> I just wanted to raise the point about the time this would be delivered.
>> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too
>> but was explained it is probably too soon, I guess you would experience
>> something similar.
>>
>> Sam's comment is from May and maybe a lot has changed since in then and
>> his comment is not applicable anymore. It wou

Re: Capabilities

2024-12-20 Thread Jordan West

One minor clarification: ETS is entirely in memory  (unless you explicitly
dump it to disk or use DETS) so the equivalence to a local system table is
only partially accurate but I think the parallel is fine in the case of
what I was describing.

Jordan

On Fri, Dec 20, 2024 at 09:07 Jordan West  wrote:

> Benedict, I agree with you TCM might be overkill for capabilities. It’s
> truly something that’s fine to be eventually consistent. Riaks
> implementation used a local ETS table (ETS is built into Erlang -
> equivalent for us would a local only system table) and an efficient and
> reliable gossip protocol. The data was a simple CRDT basically (a
> map> basically of support features in preference order
> with the only operations being additions and reads).
>
> So i agree with you that we could be using TCM as a hammer for every nail
> here. But im also hestitant to introduce something new. Distributed tables,
> or a virtual table with some way to aggregate accross the cluster, would
> also work. In either case we would need a local cache (like Denylist).
>
> From a requirements perspective reads need to be local (because they may
> be done in a hot path) but writes can be slow (typically only change on
> start up or during operator intervention).
>
> Jordan
>
>
>
> On Fri, Dec 20, 2024 at 01:53 Benedict  wrote:
>
>> If you perform a read from a distributed table on startup you will find
>> the latest information. What catchup are you thinking of? I don’t think any
>> of the features we talked about need a log, only the latest information.
>>
>> We can (and should) probably introduce event listeners for distributed
>> tables, as this is also a really great feature, but I don’t think this
>> should be necessary here.
>>
>> Regarding disagreements: if you use LWTs then there are no consistency
>> issues to worry about.
>>
>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>> is becoming our new hammer with everything a nail. It would be better IMO
>> to keep TCM scoped to essential functionality as it’s critical to
>> correctness. Perhaps we could extend its APIs to less critical services
>> without intertwining them with membership, schema and epoch handling.
>>
>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>> wrote:
>>
>> 
>>
>> I find TCM way more comfortable to work with. The capability of log being
>> replayed on restart and catching up with everything else automatically is
>> god-sent. If we had that on "good old distributed tables", then is it not
>> true that we would need to take extra care of that, e.g. we would need to
>> repair it etc ... It might be the source of the discrepancies /
>> disagreements etc. TCM is just "maintenance-free" and _just works_.
>>
>> I think I was also investigating distributed tables but was just pulled
>> towards TCM naturally because of its goodies.
>>
>> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>>
>>> TCM is a perfectly valid basis for this, but TCM is only really
>>> *necessary* to solve meta config problems where we can’t rely on the rest
>>> of the database working. Particularly placement issues, which is why schema
>>> and membership need to live there.
>>>
>>> It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> That said, it’s possible config might be better represented as part of
>>> the schema (and we already store some relevant config there) in which case
>>> it would live in TCM automatically. Migrating existing configs to a
>>> distributed setup will be fun however we do it though.
>>>
>>> Capabilities also feel naturally related to other membership
>>> information, so TCM might be the most suitable place, particularly for
>>> handling downgrades after capabilities have been enabled (if we ever expect
>>> to support turning off capabilities and then downgrading - which today we
>>> mostly don’t).
>>>
>>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> Jordan,
>>>
>>> I also think that having it on TCM would be ideal and we should explore
>>> this path first before doing anything custom.
>>>
>>> Regarding my idea about the guardrails in TCM, when I prototyped that
>>> and wanted to make it happen, there was a little bit of a pushback (1)
>>> (even though super reasonable one) that TCM is just too young at the moment
>>> and it would be desirable to go through some stabilisation period.
>>>
>>> Another idea was that we should not make just guardrails happen but the
>>> whole config should be in TCM. From what I put together, Sam / Alex does
>>> not seem to be opposed to this idea, rather the opposite, but having CEP
>>> about that is way more involved than having just guardrails there. I
>>> consider guardrails to be kind of special and I do not think that having
>>> all configurations in TCM (which guardrails are part of) is the absolute
>>> must in order to deliver that. I may start with guardrails CEP and you may
>>> explore Capabilitie

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

Having a parallel and feature focused TCM log as you suggested seems
perfectly reasonable to me.

On Fri, Dec 20, 2024 at 11:33 AM Benedict  wrote:

> Guardrails are broadly the same as Auth which works this way, but with
> less criticality. It’s fine if guardrails are updated slowly.
>
> But, again, TCM is a fine target for this. It would however be nice to
> have an in-between capability though, TCM-lite if you will, for these
> features. Perhaps even just a parallel TCM log.
>
>
>
>
> On 20 Dec 2024, at 10:24, Štefan Miklošovič 
> wrote:
>
> 
> What do you mean by a distributed table? You mean these in
> system_distributed keyspace?
>
> If so, imagine we introduce a table system_distributed.guardrails where
> each row would hold what a guardrail would be set to, hence on guardrails
> evaluation in runtime (and there are a bunch of them to consider), it would
> read this table every single time? Basically performing a select query on
> this and that guardrail to see what its value is? There are plenty of
> places where guardrails are evaluated, would not this slow things down
> considerably?
>
> So, if we do not want to do that, would we start to cache it? So table +
> cache? Isn't this becoming just too complicated?
>
> With guardrails in TCM, if we commit a transformation from some node that
> a guardrail xyz changed its state from false to true, this gets propagated
> to every single node in some epoch eventually so there is no reason to read
> them from any table. A node would apply this transformation to itself as it
> digests new epochs it pulled from cms.
>
> The point about hammer and all things being nails resonates with me. I
> agree we should be cautious about this to not "bastardize" TCM by using it
> for something unnecessary, but on the other hand we should be open to
> exploring what such an implementation would mean _in details_ (what we do
> here) before ruling it out for good.
>
> I am all ears if you guys see how it should work differently, I am still
> in the process of putting all parts of the puzzle together so please be so
> nice to prove me wrong.
>
> Regards
>
> On Fri, Dec 20, 2024 at 10:53 AM Benedict  wrote:
>
>> If you perform a read from a distributed table on startup you will find
>> the latest information. What catchup are you thinking of? I don’t think any
>> of the features we talked about need a log, only the latest information.
>>
>> We can (and should) probably introduce event listeners for distributed
>> tables, as this is also a really great feature, but I don’t think this
>> should be necessary here.
>>
>> Regarding disagreements: if you use LWTs then there are no consistency
>> issues to worry about.
>>
>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>> is becoming our new hammer with everything a nail. It would be better IMO
>> to keep TCM scoped to essential functionality as it’s critical to
>> correctness. Perhaps we could extend its APIs to less critical services
>> without intertwining them with membership, schema and epoch handling.
>>
>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>> wrote:
>>
>> 
>> I find TCM way more comfortable to work with. The capability of log being
>> replayed on restart and catching up with everything else automatically is
>> god-sent. If we had that on "good old distributed tables", then is it not
>> true that we would need to take extra care of that, e.g. we would need to
>> repair it etc ... It might be the source of the discrepancies /
>> disagreements etc. TCM is just "maintenance-free" and _just works_.
>>
>> I think I was also investigating distributed tables but was just pulled
>> towards TCM naturally because of its goodies.
>>
>> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>>
>>> TCM is a perfectly valid basis for this, but TCM is only really
>>> *necessary* to solve meta config problems where we can’t rely on the rest
>>> of the database working. Particularly placement issues, which is why schema
>>> and membership need to live there.
>>>
>>> It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> That said, it’s possible config might be better represented as part of
>>> the schema (and we already store some relevant config there) in which case
>>> it would live in TCM automatically. Migrating existing configs to a
>>> distributed setup will be fun however we do it though.
>>>
>>> Capabilities also feel naturally related to other membership
>>> information, so TCM might be the most suitable place, particularly for
>>> handling downgrades after capabilities have been enabled (if we ever expect
>>> to support turning off capabilities and then downgrading - which today we
>>> mostly don’t).
>>>
>>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> Jordan,
>>>
>>> I also think that having it on TCM would be ideal and we should explore
>>> this path first before doing anything custom.
>>>
>>> Regarding my idea

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Brandon Williams

I think after a discussion on #cassandra-dev yesterday, we are going
to remove the requirement for schema agreement to deliver hints, as
suggested by Jeff Jirsa.

Kind Regards,
Brandon

On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler  wrote:
>
> Hi Brandon,
>
> I am not sure which part changes after CASSANDRA-20118, there is still the 
> system mismatch going to CASSANDRA_4 caused by the change in 
> system.compaction_history, and going to UPGRADING, this is caused by the 2 
> different sstable formats, so nothing that CASSANDRA-20118 fixes.
>
> So while CASSANDRA-20118 improves things, it does not fix these specific 
> issues, unless I have missed something?
>
> > On 19 Dec 2024, at 12:17, Brandon Williams  wrote:
> >
> > On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler  wrote:
> >> C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent 
> >> from C*4 node to C*5 nodes.
> >> CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be 
> >> added or replaced.
> >> UPGRADING-> NONE: No issues.
> >
> > I'll note this will change after CASSANDRA-20118
> >
> >> Any thoughts on whether having SCM controlled by JMX/nodetool is a good 
> >> idea?
> >
> > I think it's a good idea but it's tricky.  As I said on 20118, "An
> > unfortunate consequence of our use of static initialization is that
> > once started, there is no way to change storage compatibility modes"
> > and all the columns are defined statically, so that will have to be
> > overcome.
> >
> > Kind Regards,
> > Brandon
>

Re: Capabilities

2024-12-20 Thread Paulo Motta

> It should be possible to use distributed system tables just fine for
capabilities, config and guardrails.

I have been thinking about this recently and I agree we should be wary
about introducing new TCM states and create additional complexity that can
be serviced by existing data dissemination mechanisms (gossip/system
tables). I would prefer that we take a more phased and incremental approach
to introduce new TCM states.

As a way to accomplish that, I have thought about introducing a new generic
TCM state "In Maintenance", where schema or membership changes are
"frozen/disallowed" while an external operation is taking place. This
"external operation" could mean many things:
- Upgrade
- Downgrade
- Migration
- Capability Enablement/Disablement

These could be sub-states of the "Maintenance" TCM state, that could be
managed externally (via cache/gossip/system tables/sidecar). Once these
sub-states are validated thouroughly and mature enough, we could "promote"
them to top-level TCM states.

In the end what really matters is that cluster and schema membership
changes do not happen while a miscellaneous operation is taking place.

Would this make sense as an initial way to integrate TCM with capabilities
framework ?

On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:

> If you perform a read from a distributed table on startup you will find
> the latest information. What catchup are you thinking of? I don’t think any
> of the features we talked about need a log, only the latest information.
>
> We can (and should) probably introduce event listeners for distributed
> tables, as this is also a really great feature, but I don’t think this
> should be necessary here.
>
> Regarding disagreements: if you use LWTs then there are no consistency
> issues to worry about.
>
> Again, I’m not opposed to using TCM, although I am a little worried TCM is
> becoming our new hammer with everything a nail. It would be better IMO to
> keep TCM scoped to essential functionality as it’s critical to correctness.
> Perhaps we could extend its APIs to less critical services without
> intertwining them with membership, schema and epoch handling.
>
> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
> wrote:
>
> 
> I find TCM way more comfortable to work with. The capability of log being
> replayed on restart and catching up with everything else automatically is
> god-sent. If we had that on "good old distributed tables", then is it not
> true that we would need to take extra care of that, e.g. we would need to
> repair it etc ... It might be the source of the discrepancies /
> disagreements etc. TCM is just "maintenance-free" and _just works_.
>
> I think I was also investigating distributed tables but was just pulled
> towards TCM naturally because of its goodies.
>
> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>
>> TCM is a perfectly valid basis for this, but TCM is only really
>> *necessary* to solve meta config problems where we can’t rely on the rest
>> of the database working. Particularly placement issues, which is why schema
>> and membership need to live there.
>>
>> It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> That said, it’s possible config might be better represented as part of
>> the schema (and we already store some relevant config there) in which case
>> it would live in TCM automatically. Migrating existing configs to a
>> distributed setup will be fun however we do it though.
>>
>> Capabilities also feel naturally related to other membership information,
>> so TCM might be the most suitable place, particularly for handling
>> downgrades after capabilities have been enabled (if we ever expect to
>> support turning off capabilities and then downgrading - which today we
>> mostly don’t).
>>
>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Jordan,
>>
>> I also think that having it on TCM would be ideal and we should explore
>> this path first before doing anything custom.
>>
>> Regarding my idea about the guardrails in TCM, when I prototyped that and
>> wanted to make it happen, there was a little bit of a pushback (1) (even
>> though super reasonable one) that TCM is just too young at the moment and
>> it would be desirable to go through some stabilisation period.
>>
>> Another idea was that we should not make just guardrails happen but the
>> whole config should be in TCM. From what I put together, Sam / Alex does
>> not seem to be opposed to this idea, rather the opposite, but having CEP
>> about that is way more involved than having just guardrails there. I
>> consider guardrails to be kind of special and I do not think that having
>> all configurations in TCM (which guardrails are part of) is the absolute
>> must in order to deliver that. I may start with guardrails CEP and you may
>> explore Capabilities CEP on TCM too, if that makes sense?
>>
>> I just wanted to raise the point about the time this would be delivered.
>> If

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

Jordan,

I also think that having it on TCM would be ideal and we should explore
this path first before doing anything custom.

Regarding my idea about the guardrails in TCM, when I prototyped that and
wanted to make it happen, there was a little bit of a pushback (1) (even
though super reasonable one) that TCM is just too young at the moment and
it would be desirable to go through some stabilisation period.

Another idea was that we should not make just guardrails happen but the
whole config should be in TCM. From what I put together, Sam / Alex does
not seem to be opposed to this idea, rather the opposite, but having CEP
about that is way more involved than having just guardrails there. I
consider guardrails to be kind of special and I do not think that having
all configurations in TCM (which guardrails are part of) is the absolute
must in order to deliver that. I may start with guardrails CEP and you may
explore Capabilities CEP on TCM too, if that makes sense?

I just wanted to raise the point about the time this would be delivered. If
Capabilities are built on TCM and I wanted to do Guardrails on TCM too but
was explained it is probably too soon, I guess you would experience
something similar.

Sam's comment is from May and maybe a lot has changed since in then and his
comment is not applicable anymore. It would be great to know if we could
build on top of the current trunk already or we will wait until 5.1/6.0 is
delivered.

(1)
https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326

On Fri, Dec 20, 2024 at 2:17 AM Jordan West  wrote:

> Firstly, glad to see the support and enthusiasm here and in the recent
> Slack discussion. I think there is enough for me to start drafting a CEP.
>
> Stefan, global configuration and capabilities do have some overlap but not
> full overlap. For example, you may want to set globally that a cluster
> enables feature X or control the threshold for a guardrail but you still
> need to know if all nodes support feature X or have that guardrail, the
> latter is what capabilities targets. I do think capabilities are a step
> towards supporting global configuration and the work you described is
> another step (that we could do after capabilities or in parallel with them
> in mind). I am also supportive of exploring global configuration for the
> reasons you mentioned.
>
> In terms of how capabilities get propagated across the cluster, I hadn't
> put much thought into it yet past likely TCM since this will be a new
> feature that lands after TCM. In Riak, we had gossip (but more mature than
> C*s -- this was an area I contributed to a lot so very familiar) to
> disseminate less critical information such as capabilities and a separate
> layer that did TCM. Since we don't have this in C* I don't think we would
> want to build a separate distribution channel for capabilities metadata
> when we already have TCM in place. But I plan to explore this more as I
> draft the CEP.
>
> Jordan
>
> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič 
> wrote:
>
>> Hi Jordan,
>>
>> what would this look like from the implementation perspective? I was
>> experimenting with transactional guardrails where an operator would control
>> the content of a virtual table which would be backed by TCM so whatever
>> guardrail we would change, this would be automatically and transparently
>> propagated to every node in a cluster. The POC worked quite nicely. TCM is
>> just a vehicle to commit a change which would spread around and all these
>> settings would survive restarts. We would have the same configuration
>> everywhere which is not currently the case because guardrails are
>> configured per node and if not persisted to yaml, on restart their values
>> would be forgotten.
>>
>> Guardrails are just an example, what is quite obvious is to expand this
>> idea to the whole configuration in yaml. Of course, not all properties in
>> yaml make sense to be the same cluster-wise (ip addresses etc ...), but the
>> ones which do would be again set everywhere the same way.
>>
>> The approach I described above is that we make sure that the
>> configuration is same everywhere, hence there can be no misunderstanding
>> what features this or that node has, if we say that all nodes have to have
>> a particular feature because we said so in TCM log so on restart / replay,
>> a node with "catch up" with whatever features it is asked to turn on.
>>
>> Your approach seems to be that we distribute what all capabilities /
>> features a cluster supports and that each individual node configures itself
>> in some way or not to comply?
>>
>> Is there any intersection in these approaches? At first sight it seems
>> somehow related. How is one different from another from your point of view?
>>
>> Regards
>>
>> (1) https://issues.apache.org/jira/browse/CASSANDRA-19593
>>
>> On Thu, Dec 19, 2024 at 12:00 AM Jordan West

Re: Capabilities

2024-12-20 Thread Benedict

TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič  wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sam / Alex does not seem to be opposed to this idea, rather the opposite, but having CEP about that is way more involved than having just guardrails there. I consider guardrails to be kind of special and I do not think that having all configurations in TCM (which guardrails are part of) is the absolute must in order to deliver that. I may start with guardrails CEP and you may explore Capabilities CEP on TCM too, if that makes sense? I just wanted to raise the point about the time this would be delivered. If Capabilities are built on TCM and I wanted to do Guardrails on TCM too but was explained it is probably too soon, I guess you would experience something similar.Sam's comment is from May and maybe a lot has changed since in then and his comment is not applicable anymore. It would be great to know if we could build on top of the current trunk already or we will wait until 5.1/6.0 is delivered.(1) https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326On Fri, Dec 20, 2024 at 2:17 AM Jordan West  wrote:Firstly, glad to see the support and enthusiasm here and in the recent Slack discussion. I think there is enough for me to start drafting a CEP. Stefan, global configuration and capabilities do have some overlap but not full overlap. For example, you may want to set globally that a cluster enables feature X or control the threshold for a guardrail but you still need to know if all nodes support feature X or have that guardrail, the latter is what capabilities targets. I do think capabilities are a step towards supporting global configuration and the work you described is another step (that we could do after capabilities or in parallel with them in mind). I am also supportive of exploring global configuration for the reasons you mentioned.  In terms of how capabilities get propagated across the cluster, I hadn't put much thought into it yet past likely TCM since this will be a new feature that lands after TCM. In Riak, we had gossip (but more mature than C*s -- this was an area I contributed to a lot so very familiar) to disseminate less critical information such as capabilities and a separate layer that did TCM. Since we don't have this in C* I don't think we would want to build a separate distribution channel for capabilities metadata when we already have TCM in place. But I plan to explore this more as I draft the CEP.JordanOn Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič  wrote:Hi Jordan,what would this look like from the implementation perspective? I was experimenting with transactional guardrails where an operator would control the content of a virtual table which would be backed by TCM so whatever guardrail we would change, this would be automatically and transparently propagated to every node in a cluster. The POC worked quite nicely. TCM is just a vehicle to commit a change which would spread around and all these settings would survive restarts. We would have the same configuration everywhere which is not currently the case because guardrails are configured per node and if not persisted to yaml, on restart their values would be forgotten.Guardrails are just an example, what is quite obvious is to expand this idea to the whole configuration in yaml. Of course, not all properties in yaml make sense to be the same cluster-wise (ip addresses etc ...), but the ones which do

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

I find TCM way more comfortable to work with. The capability of log being
replayed on restart and catching up with everything else automatically is
god-sent. If we had that on "good old distributed tables", then is it not
true that we would need to take extra care of that, e.g. we would need to
repair it etc ... It might be the source of the discrepancies /
disagreements etc. TCM is just "maintenance-free" and _just works_.

I think I was also investigating distributed tables but was just pulled
towards TCM naturally because of its goodies.

On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:

> TCM is a perfectly valid basis for this, but TCM is only really
> *necessary* to solve meta config problems where we can’t rely on the rest
> of the database working. Particularly placement issues, which is why schema
> and membership need to live there.
>
> It should be possible to use distributed system tables just fine for
> capabilities, config and guardrails.
>
> That said, it’s possible config might be better represented as part of the
> schema (and we already store some relevant config there) in which case it
> would live in TCM automatically. Migrating existing configs to a
> distributed setup will be fun however we do it though.
>
> Capabilities also feel naturally related to other membership information,
> so TCM might be the most suitable place, particularly for handling
> downgrades after capabilities have been enabled (if we ever expect to
> support turning off capabilities and then downgrading - which today we
> mostly don’t).
>
> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
> wrote:
>
> 
> Jordan,
>
> I also think that having it on TCM would be ideal and we should explore
> this path first before doing anything custom.
>
> Regarding my idea about the guardrails in TCM, when I prototyped that and
> wanted to make it happen, there was a little bit of a pushback (1) (even
> though super reasonable one) that TCM is just too young at the moment and
> it would be desirable to go through some stabilisation period.
>
> Another idea was that we should not make just guardrails happen but the
> whole config should be in TCM. From what I put together, Sam / Alex does
> not seem to be opposed to this idea, rather the opposite, but having CEP
> about that is way more involved than having just guardrails there. I
> consider guardrails to be kind of special and I do not think that having
> all configurations in TCM (which guardrails are part of) is the absolute
> must in order to deliver that. I may start with guardrails CEP and you may
> explore Capabilities CEP on TCM too, if that makes sense?
>
> I just wanted to raise the point about the time this would be delivered.
> If Capabilities are built on TCM and I wanted to do Guardrails on TCM too
> but was explained it is probably too soon, I guess you would experience
> something similar.
>
> Sam's comment is from May and maybe a lot has changed since in then and
> his comment is not applicable anymore. It would be great to know if we
> could build on top of the current trunk already or we will wait until
> 5.1/6.0 is delivered.
>
> (1)
> https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326
>
> On Fri, Dec 20, 2024 at 2:17 AM Jordan West  wrote:
>
>> Firstly, glad to see the support and enthusiasm here and in the recent
>> Slack discussion. I think there is enough for me to start drafting a CEP.
>>
>> Stefan, global configuration and capabilities do have some overlap but
>> not full overlap. For example, you may want to set globally that a cluster
>> enables feature X or control the threshold for a guardrail but you still
>> need to know if all nodes support feature X or have that guardrail, the
>> latter is what capabilities targets. I do think capabilities are a step
>> towards supporting global configuration and the work you described is
>> another step (that we could do after capabilities or in parallel with them
>> in mind). I am also supportive of exploring global configuration for the
>> reasons you mentioned.
>>
>> In terms of how capabilities get propagated across the cluster, I hadn't
>> put much thought into it yet past likely TCM since this will be a new
>> feature that lands after TCM. In Riak, we had gossip (but more mature than
>> C*s -- this was an area I contributed to a lot so very familiar) to
>> disseminate less critical information such as capabilities and a separate
>> layer that did TCM. Since we don't have this in C* I don't think we would
>> want to build a separate distribution channel for capabilities metadata
>> when we already have TCM in place. But I plan to explore this more as I
>> draft the CEP.
>>
>> Jordan
>>
>> On Thu, Dec 19, 2024 at 1:48 PM Štefan Miklošovič 
>> wrote:
>>
>>> Hi Jordan,
>>>
>>> what would this look like from the implementation perspective? I was
>>> experimenting with transactional guardrails where

Re: Capabilities

2024-12-20 Thread Benedict

If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič  wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič  wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sam / Alex does not seem to be opposed to this idea, rather the opposite, but having CEP about that is way more involved than having just guardrails there. I consider guardrails to be kind of special and I do not think that having all configurations in TCM (which guardrails are part of) is the absolute must in order to deliver that. I may start with guardrails CEP and you may explore Capabilities CEP on TCM too, if that makes sense? I just wanted to raise the point about the time this would be delivered. If Capabilities are built on TCM and I wanted to do Guardrails on TCM too but was explained it is probably too soon, I guess you would experience something similar.Sam's comment is from May and maybe a lot has changed since in then and his comment is not applicable anymore. It would be great to know if we could build on top of the current trunk already or we will wait until 5.1/6.0 is delivered.(1) https://issues.apache.org/jira/browse/CASSANDRA-19593?focusedCommentId=17844326&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17844326On Fri, Dec 20, 2024 at 2:17 AM Jordan West  wrote:Firstly, glad to see the support and enthusiasm here and in the recent Slack discussion. I think there is enough for me to start drafting a CEP. Stefan, global configuration and capabilities do have some overlap but not full overlap. For example, you may want to set globally that a cluster enables feature X or control the threshold for a guardrail but you still need to know if all nodes support feature X or have that guardrail, the latter is what capabilities targets. I do think capabilities are a step towards supporting global configuration and the work you described is another step (that we could do after capabilities or in parallel with them in mind). I am also supportive of exploring global configuration for the reasons you mentioned.  In terms of how capabilities get propagated across the cluster, I had

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe

So that would look something like...

SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' :
[, ] }

On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe 
wrote:

> You mean like to control the tokenization/analysis of query terms?
>
> On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan 
> wrote:
>
>> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
>> move into allowing analysis/tokenization on indexed items, then a more
>> general WITH OPTIONS would be useful for that too.  That would let us add
>> any other new options to a SELECT without needing to modify the grammar
>> further.
>>
>> -Jeremiah
>>
>> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
>> wrote:
>>
>>> Some of your are probably familiar with work in the DS fork to improve
>>> the selection of indexes for SAI queries in
>>> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
>>> .
>>>
>>> While I'm eagerly anticipating working on that in the new year, I'm also
>>> wondering whether we think some simple CQL extensions to manually control
>>> index selection would be helpful. Maxwell proposed this a while back
>>> in CASSANDRA-18112, and I'd like to propose a syntax:
>>>
>>>
>>> ex. Do not use the specified index during the query.
>>>
>>> SELECT ... FROM ... WHERE ... WITHOUT INDEX 
>>>
>>> This could be helpful for intersection queries where one of the provided
>>> clauses is not very selective and could simply be handled via
>>> post-filtering.
>>>
>>> ex. Require the specified index to be used.
>>>
>>> SELECT ... FROM ... WHERE ... WITH INDEX 
>>>
>>> This could be helpful in scenarios where multiple indexes exist on a
>>> column and was the primary motivation for CASSANDRA-18112.
>>>
>>> Thoughts?
>>>
>>

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe

You mean like to control the tokenization/analysis of query terms?

On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan 
wrote:

> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
> move into allowing analysis/tokenization on indexed items, then a more
> general WITH OPTIONS would be useful for that too.  That would let us add
> any other new options to a SELECT without needing to modify the grammar
> further.
>
> -Jeremiah
>
> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
> wrote:
>
>> Some of your are probably familiar with work in the DS fork to improve
>> the selection of indexes for SAI queries in
>> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
>> .
>>
>> While I'm eagerly anticipating working on that in the new year, I'm also
>> wondering whether we think some simple CQL extensions to manually control
>> index selection would be helpful. Maxwell proposed this a while back
>> in CASSANDRA-18112, and I'd like to propose a syntax:
>>
>>
>> ex. Do not use the specified index during the query.
>>
>> SELECT ... FROM ... WHERE ... WITHOUT INDEX 
>>
>> This could be helpful for intersection queries where one of the provided
>> clauses is not very selective and could simply be handled via
>> post-filtering.
>>
>> ex. Require the specified index to be used.
>>
>> SELECT ... FROM ... WHERE ... WITH INDEX 
>>
>> This could be helpful in scenarios where multiple indexes exist on a
>> column and was the primary motivation for CASSANDRA-18112.
>>
>> Thoughts?
>>
>

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Joel Shepherd


WITH INDEX (or something equivalent) seems really useful.

Less opinionated on the specific syntax, but I think there is a lot of 
value in the form of predictable, controllable performance, in giving 
developers more direct control over query execution, whether that's 
index selection or even lower-level decisions. If you've experienced the 
thrill of operating a database with a cost-based planner that abruptly 
selects a new, sub-optimal plan due to a change in statistics or 
configuration, you'll appreciate language features that yield some 
planning control back to you. It does increase the burden on the 
developer to understand how best to execute the query, but it makes 
their intent much more obvious, and easier to adjust as the system changes.


-- Joel.

On 12/20/2024 12:28 PM, Caleb Rackliffe wrote:
Some of your are probably familiar with work in the DS fork to improve 
the selection of indexes for SAI queries in 
https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424. 



While I'm eagerly anticipating working on that in the new year, I'm 
also wondering whether we think some simple CQL extensions to manually 
control index selection would be helpful. Maxwell proposed this a 
while back in CASSANDRA-18112, and I'd like to propose a syntax:



ex. Do not use the specified index during the query.

SELECT ... FROM ... WHERE ... WITHOUT INDEX 

This could be helpful for intersection queries where one of the 
provided clauses is not very selective and could simply be handled via 
post-filtering.


ex. Require the specified index to be used.

SELECT ... FROM ... WHERE ... WITH INDEX 

This could be helpful in scenarios where multiple indexes exist on a 
column and was the primary motivation for CASSANDRA-18112.


Thoughts?

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan

>
> On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe 
> wrote:
>
>> You mean like to control the tokenization/analysis of query terms?
>>
>
Yes.  Elastic for example lets you specify the query time analyzer in the
query, over riding what is specified at the index level.

https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html#specify-search-query-analyzer

On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe 
wrote:

> So that would look something like...
>
> SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' :
> [, ] }
>

Yeah something like that would work.

On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe 
wrote:

> So that would look something like...
>
> SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' :
> [, ] }
>
> On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe 
> wrote:
>
>> You mean like to control the tokenization/analysis of query terms?
>>
>> On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>
>>> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
>>> move into allowing analysis/tokenization on indexed items, then a more
>>> general WITH OPTIONS would be useful for that too.  That would let us add
>>> any other new options to a SELECT without needing to modify the grammar
>>> further.
>>>
>>> -Jeremiah
>>>
>>> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
>>> wrote:
>>>
 Some of your are probably familiar with work in the DS fork to improve
 the selection of indexes for SAI queries in
 https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
 .

 While I'm eagerly anticipating working on that in the new year, I'm
 also wondering whether we think some simple CQL extensions to manually
 control index selection would be helpful. Maxwell proposed this a while
 back in CASSANDRA-18112, and I'd like to propose a syntax:

 ex. Do not use the specified index during the query.

 SELECT ... FROM ... WHERE ... WITHOUT INDEX 

 This could be helpful for intersection queries where one of the
 provided clauses is not very selective and could simply be handled via
 post-filtering.

 ex. Require the specified index to be used.

 SELECT ... FROM ... WHERE ... WITH INDEX 

 This could be helpful in scenarios where multiple indexes exist on a
 column and was the primary motivation for CASSANDRA-18112.

 Thoughts?

>>>

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super
reasonable to be put there.

On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič 
wrote:

> I am super hesitant to base distributed guardrails or any configuration
> for that matter on anything but TCM. Does not "C" in TCM stand for
> "configuration" anyway? So rename it to TSM like "schema" then if it is
> meant to be just for that. It seems to be quite ridiculous to code tables
> with caches on top when we have way more effective tooling thanks to CEP-21
> to deal with that with clear advantages of getting rid of all of that old
> mechanism we have in place.
>
> I have not seen any concrete examples of risks why using TCM should be
> just for what it is currently for. Why not put the configuration meant to
> be cluster-wide into that?
>
> What is it ... performance? What does even the term "additional
> complexity" mean? Complex in what? Do you think that putting there 3 types
> of transformations in case of guardrails which flip some booleans and
> numbers would suddenly make TCM way more complex? Come on ...
>
> This has nothing to do with what Jordan is trying to introduce. I think we
> all agree he knows what he is doing and if he evaluates that TCM is too
> much for his use case (or it is not a good fit) that is perfectly fine.
>
> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:
>
>> > It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> I have been thinking about this recently and I agree we should be wary
>> about introducing new TCM states and create additional complexity that can
>> be serviced by existing data dissemination mechanisms (gossip/system
>> tables). I would prefer that we take a more phased and incremental approach
>> to introduce new TCM states.
>>
>> As a way to accomplish that, I have thought about introducing a new
>> generic TCM state "In Maintenance", where schema or membership changes are
>> "frozen/disallowed" while an external operation is taking place. This
>> "external operation" could mean many things:
>> - Upgrade
>> - Downgrade
>> - Migration
>> - Capability Enablement/Disablement
>>
>> These could be sub-states of the "Maintenance" TCM state, that could be
>> managed externally (via cache/gossip/system tables/sidecar). Once these
>> sub-states are validated thouroughly and mature enough, we could "promote"
>> them to top-level TCM states.
>>
>> In the end what really matters is that cluster and schema membership
>> changes do not happen while a miscellaneous operation is taking place.
>>
>> Would this make sense as an initial way to integrate TCM with
>> capabilities framework ?
>>
>> On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:
>>
>>> If you perform a read from a distributed table on startup you will find
>>> the latest information. What catchup are you thinking of? I don’t think any
>>> of the features we talked about need a log, only the latest information.
>>>
>>> We can (and should) probably introduce event listeners for distributed
>>> tables, as this is also a really great feature, but I don’t think this
>>> should be necessary here.
>>>
>>> Regarding disagreements: if you use LWTs then there are no consistency
>>> issues to worry about.
>>>
>>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>>> is becoming our new hammer with everything a nail. It would be better IMO
>>> to keep TCM scoped to essential functionality as it’s critical to
>>> correctness. Perhaps we could extend its APIs to less critical services
>>> without intertwining them with membership, schema and epoch handling.
>>>
>>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> I find TCM way more comfortable to work with. The capability of log
>>> being replayed on restart and catching up with everything else
>>> automatically is god-sent. If we had that on "good old distributed tables",
>>> then is it not true that we would need to take extra care of that, e.g. we
>>> would need to repair it etc ... It might be the source of the discrepancies
>>> / disagreements etc. TCM is just "maintenance-free" and _just works_.
>>>
>>> I think I was also investigating distributed tables but was just pulled
>>> towards TCM naturally because of its goodies.
>>>
>>> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>>>
 TCM is a perfectly valid basis for this, but TCM is only really
 *necessary* to solve meta config problems where we can’t rely on the rest
 of the database working. Particularly placement issues, which is why schema
 and membership need to live there.

 It should be possible to use distributed system tables just fine for
 capabilities, config and guardrails.

 That said, it’s possible config might be better represented as part of
 the schema (and we already store some relevant config there) in which case
 it would live in TCM automatically. Migrating existing configs t

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan

 Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
move into allowing analysis/tokenization on indexed items, then a more
general WITH OPTIONS would be useful for that too.  That would let us add
any other new options to a SELECT without needing to modify the grammar
further.

-Jeremiah

On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
wrote:

> Some of your are probably familiar with work in the DS fork to improve the
> selection of indexes for SAI queries in
> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
> .
>
> While I'm eagerly anticipating working on that in the new year, I'm also
> wondering whether we think some simple CQL extensions to manually control
> index selection would be helpful. Maxwell proposed this a while back
> in CASSANDRA-18112, and I'd like to propose a syntax:
>
>
> ex. Do not use the specified index during the query.
>
> SELECT ... FROM ... WHERE ... WITHOUT INDEX 
>
> This could be helpful for intersection queries where one of the provided
> clauses is not very selective and could simply be handled via
> post-filtering.
>
> ex. Require the specified index to be used.
>
> SELECT ... FROM ... WHERE ... WITH INDEX 
>
> This could be helpful in scenarios where multiple indexes exist on a
> column and was the primary motivation for CASSANDRA-18112.
>
> Thoughts?
>

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Paul Chandler

Hi Brandon,

That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t 
transmit the hints?

Thanks 

Paul 

> On 20 Dec 2024, at 13:41, Brandon Williams  wrote:
> 
> I think after a discussion on #cassandra-dev yesterday, we are going
> to remove the requirement for schema agreement to deliver hints, as
> suggested by Jeff Jirsa.
> 
> Kind Regards,
> Brandon
> 
> On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler  wrote:
>> 
>> Hi Brandon,
>> 
>> I am not sure which part changes after CASSANDRA-20118, there is still the 
>> system mismatch going to CASSANDRA_4 caused by the change in 
>> system.compaction_history, and going to UPGRADING, this is caused by the 2 
>> different sstable formats, so nothing that CASSANDRA-20118 fixes.
>> 
>> So while CASSANDRA-20118 improves things, it does not fix these specific 
>> issues, unless I have missed something?
>> 
>>> On 19 Dec 2024, at 12:17, Brandon Williams  wrote:
>>> 
>>> On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler  wrote:
 C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent 
 from C*4 node to C*5 nodes.
 CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be 
 added or replaced.
 UPGRADING-> NONE: No issues.
>>> 
>>> I'll note this will change after CASSANDRA-20118
>>> 
 Any thoughts on whether having SCM controlled by JMX/nodetool is a good 
 idea?
>>> 
>>> I think it's a good idea but it's tricky.  As I said on 20118, "An
>>> unfortunate consequence of our use of static initialization is that
>>> once started, there is no way to change storage compatibility modes"
>>> and all the columns are defined statically, so that will have to be
>>> overcome.
>>> 
>>> Kind Regards,
>>> Brandon
>>

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

What do you mean by a distributed table? You mean these in
system_distributed keyspace?

If so, imagine we introduce a table system_distributed.guardrails where
each row would hold what a guardrail would be set to, hence on guardrails
evaluation in runtime (and there are a bunch of them to consider), it would
read this table every single time? Basically performing a select query on
this and that guardrail to see what its value is? There are plenty of
places where guardrails are evaluated, would not this slow things down
considerably?

So, if we do not want to do that, would we start to cache it? So table +
cache? Isn't this becoming just too complicated?

With guardrails in TCM, if we commit a transformation from some node that a
guardrail xyz changed its state from false to true, this gets propagated to
every single node in some epoch eventually so there is no reason to read
them from any table. A node would apply this transformation to itself as it
digests new epochs it pulled from cms.

The point about hammer and all things being nails resonates with me. I
agree we should be cautious about this to not "bastardize" TCM by using it
for something unnecessary, but on the other hand we should be open to
exploring what such an implementation would mean _in details_ (what we do
here) before ruling it out for good.

I am all ears if you guys see how it should work differently, I am still in
the process of putting all parts of the puzzle together so please be so
nice to prove me wrong.

Regards

On Fri, Dec 20, 2024 at 10:53 AM Benedict  wrote:

> If you perform a read from a distributed table on startup you will find
> the latest information. What catchup are you thinking of? I don’t think any
> of the features we talked about need a log, only the latest information.
>
> We can (and should) probably introduce event listeners for distributed
> tables, as this is also a really great feature, but I don’t think this
> should be necessary here.
>
> Regarding disagreements: if you use LWTs then there are no consistency
> issues to worry about.
>
> Again, I’m not opposed to using TCM, although I am a little worried TCM is
> becoming our new hammer with everything a nail. It would be better IMO to
> keep TCM scoped to essential functionality as it’s critical to correctness.
> Perhaps we could extend its APIs to less critical services without
> intertwining them with membership, schema and epoch handling.
>
> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
> wrote:
>
> 
> I find TCM way more comfortable to work with. The capability of log being
> replayed on restart and catching up with everything else automatically is
> god-sent. If we had that on "good old distributed tables", then is it not
> true that we would need to take extra care of that, e.g. we would need to
> repair it etc ... It might be the source of the discrepancies /
> disagreements etc. TCM is just "maintenance-free" and _just works_.
>
> I think I was also investigating distributed tables but was just pulled
> towards TCM naturally because of its goodies.
>
> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>
>> TCM is a perfectly valid basis for this, but TCM is only really
>> *necessary* to solve meta config problems where we can’t rely on the rest
>> of the database working. Particularly placement issues, which is why schema
>> and membership need to live there.
>>
>> It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> That said, it’s possible config might be better represented as part of
>> the schema (and we already store some relevant config there) in which case
>> it would live in TCM automatically. Migrating existing configs to a
>> distributed setup will be fun however we do it though.
>>
>> Capabilities also feel naturally related to other membership information,
>> so TCM might be the most suitable place, particularly for handling
>> downgrades after capabilities have been enabled (if we ever expect to
>> support turning off capabilities and then downgrading - which today we
>> mostly don’t).
>>
>> On 20 Dec 2024, at 08:42, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Jordan,
>>
>> I also think that having it on TCM would be ideal and we should explore
>> this path first before doing anything custom.
>>
>> Regarding my idea about the guardrails in TCM, when I prototyped that and
>> wanted to make it happen, there was a little bit of a pushback (1) (even
>> though super reasonable one) that TCM is just too young at the moment and
>> it would be desirable to go through some stabilisation period.
>>
>> Another idea was that we should not make just guardrails happen but the
>> whole config should be in TCM. From what I put together, Sam / Alex does
>> not seem to be opposed to this idea, rather the opposite, but having CEP
>> about that is way more involved than having just guardrails there. I
>> consider guardrails to be kind of special and I do not think that having
>> all configurations

Re: Capabilities

2024-12-20 Thread Benedict

Guardrails are broadly the same as Auth which works this way, but with less criticality. It’s fine if guardrails are updated slowly.But, again, TCM is a fine target for this. It would however be nice to have an in-between capability though, TCM-lite if you will, for these features. Perhaps even just a parallel TCM log.On 20 Dec 2024, at 10:24, Štefan Miklošovič  wrote:What do you mean by a distributed table? You mean these in system_distributed keyspace?If so, imagine we introduce a table system_distributed.guardrails where each row would hold what a guardrail would be set to, hence on guardrails evaluation in runtime (and there are a bunch of them to consider), it would read this table every single time? Basically performing a select query on this and that guardrail to see what its value is? There are plenty of places where guardrails are evaluated, would not this slow things down considerably?So, if we do not want to do that, would we start to cache it? So table + cache? Isn't this becoming just too complicated?With guardrails in TCM, if we commit a transformation from some node that a guardrail xyz changed its state from false to true, this gets propagated to every single node in some epoch eventually so there is no reason to read them from any table. A node would apply this transformation to itself as it digests new epochs it pulled from cms.The point about hammer and all things being nails resonates with me. I agree we should be cautious about this to not "bastardize" TCM by using it for something unnecessary, but on the other hand we should be open to exploring what such an implementation would mean _in details_ (what we do here) before ruling it out for good.I am all ears if you guys see how it should work differently, I am still in the process of putting all parts of the puzzle together so please be so nice to prove me wrong. RegardsOn Fri, Dec 20, 2024 at 10:53 AM Benedict  wrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič  wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fine for capabilities, config and guardrails.That said, it’s possible config might be better represented as part of the schema (and we already store some relevant config there) in which case it would live in TCM automatically. Migrating existing configs to a distributed setup will be fun however we do it though.Capabilities also feel naturally related to other membership information, so TCM might be the most suitable place, particularly for handling downgrades after capabilities have been enabled (if we ever expect to support turning off capabilities and then downgrading - which today we mostly don’t).On 20 Dec 2024, at 08:42, Štefan Miklošovič  wrote:Jordan,I also think that having it on TCM would be ideal and we should explore this path first before doing anything custom.Regarding my idea about the guardrails in TCM, when I prototyped that and wanted to make it happen, there was a little bit of a pushback (1) (even though super reasonable one) that TCM is just too young at the moment and it would be desirable to go through some stabilisation period.Another idea was that we should not make just guardrails happen but the whole config should be in TCM. From what I put together, Sa

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-20 Thread Brandon Williams

That sounds like a possibility to me on the surface.

Kind Regards,
Brandon

On Fri, Dec 20, 2024 at 8:42 AM Paul Chandler  wrote:
>
> Hi Brandon,
>
> That sounds good. Will that fix be in 4.1, as it is the old nodes that don’t 
> transmit the hints?
>
> Thanks
>
> Paul
>
> > On 20 Dec 2024, at 13:41, Brandon Williams  wrote:
> >
> > I think after a discussion on #cassandra-dev yesterday, we are going
> > to remove the requirement for schema agreement to deliver hints, as
> > suggested by Jeff Jirsa.
> >
> > Kind Regards,
> > Brandon
> >
> > On Thu, Dec 19, 2024 at 7:43 AM Paul Chandler  wrote:
> >>
> >> Hi Brandon,
> >>
> >> I am not sure which part changes after CASSANDRA-20118, there is still the 
> >> system mismatch going to CASSANDRA_4 caused by the change in 
> >> system.compaction_history, and going to UPGRADING, this is caused by the 2 
> >> different sstable formats, so nothing that CASSANDRA-20118 fixes.
> >>
> >> So while CASSANDRA-20118 improves things, it does not fix these specific 
> >> issues, unless I have missed something?
> >>
> >>> On 19 Dec 2024, at 12:17, Brandon Williams  wrote:
> >>>
> >>> On Thu, Dec 19, 2024 at 4:11 AM Paul Chandler  wrote:
>  C*4 -> CASSANDRA_4 : There is a schema mismatch, and hints are not sent 
>  from C*4 node to C*5 nodes.
>  CASSANDRA_4 -> UPGRADING: Repairs are not possible and Nodes cannot be 
>  added or replaced.
>  UPGRADING-> NONE: No issues.
> >>>
> >>> I'll note this will change after CASSANDRA-20118
> >>>
>  Any thoughts on whether having SCM controlled by JMX/nodetool is a good 
>  idea?
> >>>
> >>> I think it's a good idea but it's tricky.  As I said on 20118, "An
> >>> unfortunate consequence of our use of static initialization is that
> >>> once started, there is no way to change storage compatibility modes"
> >>> and all the columns are defined statically, so that will have to be
> >>> overcome.
> >>>
> >>> Kind Regards,
> >>> Brandon
> >>
>

Re: Capabilities

2024-12-20 Thread Paulo Motta

Apologies I missed the forked thread "Re: Capabilities" before commenting
on this. I think the TCM-lite suggestion there is not incompatible with the
generic "In Maintenance" TCM state that I am proposing, since while in this
state each individual feature could also have their independent/parallel
TCM-lite log separated from the main cluster membership log.

>  I am super hesitant to base distributed guardrails or any configuration
for that matter on anything but TCM.

This is deviating from the thread, but would this not be handled by this:
>  "it’s possible config might be better represented as part of the schema (and
we already store some relevant config there) in which case it would live in
TCM automatically. Migrating existing configs to a distributed setup will
be fun however we do it though."

I have to admit I'm not familiar with the distributed guardrails proposal
to comment on it.

I am just expressing that a generic TCM state like "In Maintenance" could
allow the TCM state machine to be "paused" while a miscellaneous operation
not part of TCM is taking place. This allows more flexibility for
operations to be externally managed without requiring that *everything uses
TCM*, potentially introducing correctness and instability risks.

This does not mean that distributed guardrails or capabilities cannot be
integrated with TCM if it makes sense.

On Fri, Dec 20, 2024 at 1:43 PM Štefan Miklošovič 
wrote:

> I am super hesitant to base distributed guardrails or any configuration
> for that matter on anything but TCM. Does not "C" in TCM stand for
> "configuration" anyway? So rename it to TSM like "schema" then if it is
> meant to be just for that. It seems to be quite ridiculous to code tables
> with caches on top when we have way more effective tooling thanks to CEP-21
> to deal with that with clear advantages of getting rid of all of that old
> mechanism we have in place.
>
> I have not seen any concrete examples of risks why using TCM should be
> just for what it is currently for. Why not put the configuration meant to
> be cluster-wide into that?
>
> What is it ... performance? What does even the term "additional
> complexity" mean? Complex in what? Do you think that putting there 3 types
> of transformations in case of guardrails which flip some booleans and
> numbers would suddenly make TCM way more complex? Come on ...
>
> This has nothing to do with what Jordan is trying to introduce. I think we
> all agree he knows what he is doing and if he evaluates that TCM is too
> much for his use case (or it is not a good fit) that is perfectly fine.
>
> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:
>
>> > It should be possible to use distributed system tables just fine for
>> capabilities, config and guardrails.
>>
>> I have been thinking about this recently and I agree we should be wary
>> about introducing new TCM states and create additional complexity that can
>> be serviced by existing data dissemination mechanisms (gossip/system
>> tables). I would prefer that we take a more phased and incremental approach
>> to introduce new TCM states.
>>
>> As a way to accomplish that, I have thought about introducing a new
>> generic TCM state "In Maintenance", where schema or membership changes are
>> "frozen/disallowed" while an external operation is taking place. This
>> "external operation" could mean many things:
>> - Upgrade
>> - Downgrade
>> - Migration
>> - Capability Enablement/Disablement
>>
>> These could be sub-states of the "Maintenance" TCM state, that could be
>> managed externally (via cache/gossip/system tables/sidecar). Once these
>> sub-states are validated thouroughly and mature enough, we could "promote"
>> them to top-level TCM states.
>>
>> In the end what really matters is that cluster and schema membership
>> changes do not happen while a miscellaneous operation is taking place.
>>
>> Would this make sense as an initial way to integrate TCM with
>> capabilities framework ?
>>
>> On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:
>>
>>> If you perform a read from a distributed table on startup you will find
>>> the latest information. What catchup are you thinking of? I don’t think any
>>> of the features we talked about need a log, only the latest information.
>>>
>>> We can (and should) probably introduce event listeners for distributed
>>> tables, as this is also a really great feature, but I don’t think this
>>> should be necessary here.
>>>
>>> Regarding disagreements: if you use LWTs then there are no consistency
>>> issues to worry about.
>>>
>>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>>> is becoming our new hammer with everything a nail. It would be better IMO
>>> to keep TCM scoped to essential functionality as it’s critical to
>>> correctness. Perhaps we could extend its APIs to less critical services
>>> without intertwining them with membership, schema and epoch handling.
>>>
>>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>>> wrote:
>>>

Re: Capabilities

2024-12-20 Thread Jordan West

On Fri, Dec 20, 2024 at 11:06 AM Benedict  wrote:

> If TCM breaks we all have a really bad time, much worse than if any one of
> these features individually has problems. If you break TCM in the right way
> the cluster could become inoperable, or operations like topology changes
> may be prevented.
>

Benedict, when you say this are you speaking hypothetically (in the sense
that by using TCM more we increase the probability of using it "wrong" and
hitting an unknown edge case) or are there known ways today that TCM
"breaks"?

Jordan


> This means that even a parallel log has some risk if we end up modifying
> shared functionality.
>
>
>
> On 20 Dec 2024, at 18:47, Štefan Miklošovič 
> wrote:
>
> 
> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super
> reasonable to be put there.
>
> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič 
> wrote:
>
>> I am super hesitant to base distributed guardrails or any configuration
>> for that matter on anything but TCM. Does not "C" in TCM stand for
>> "configuration" anyway? So rename it to TSM like "schema" then if it is
>> meant to be just for that. It seems to be quite ridiculous to code tables
>> with caches on top when we have way more effective tooling thanks to CEP-21
>> to deal with that with clear advantages of getting rid of all of that old
>> mechanism we have in place.
>>
>> I have not seen any concrete examples of risks why using TCM should be
>> just for what it is currently for. Why not put the configuration meant to
>> be cluster-wide into that?
>>
>> What is it ... performance? What does even the term "additional
>> complexity" mean? Complex in what? Do you think that putting there 3 types
>> of transformations in case of guardrails which flip some booleans and
>> numbers would suddenly make TCM way more complex? Come on ...
>>
>> This has nothing to do with what Jordan is trying to introduce. I think
>> we all agree he knows what he is doing and if he evaluates that TCM is too
>> much for his use case (or it is not a good fit) that is perfectly fine.
>>
>> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:
>>
>>> > It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> I have been thinking about this recently and I agree we should be wary
>>> about introducing new TCM states and create additional complexity that can
>>> be serviced by existing data dissemination mechanisms (gossip/system
>>> tables). I would prefer that we take a more phased and incremental approach
>>> to introduce new TCM states.
>>>
>>> As a way to accomplish that, I have thought about introducing a new
>>> generic TCM state "In Maintenance", where schema or membership changes are
>>> "frozen/disallowed" while an external operation is taking place. This
>>> "external operation" could mean many things:
>>> - Upgrade
>>> - Downgrade
>>> - Migration
>>> - Capability Enablement/Disablement
>>>
>>> These could be sub-states of the "Maintenance" TCM state, that could be
>>> managed externally (via cache/gossip/system tables/sidecar). Once these
>>> sub-states are validated thouroughly and mature enough, we could "promote"
>>> them to top-level TCM states.
>>>
>>> In the end what really matters is that cluster and schema membership
>>> changes do not happen while a miscellaneous operation is taking place.
>>>
>>> Would this make sense as an initial way to integrate TCM with
>>> capabilities framework ?
>>>
>>> On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:
>>>
 If you perform a read from a distributed table on startup you will find
 the latest information. What catchup are you thinking of? I don’t think any
 of the features we talked about need a log, only the latest information.

 We can (and should) probably introduce event listeners for distributed
 tables, as this is also a really great feature, but I don’t think this
 should be necessary here.

 Regarding disagreements: if you use LWTs then there are no consistency
 issues to worry about.

 Again, I’m not opposed to using TCM, although I am a little worried TCM
 is becoming our new hammer with everything a nail. It would be better IMO
 to keep TCM scoped to essential functionality as it’s critical to
 correctness. Perhaps we could extend its APIs to less critical services
 without intertwining them with membership, schema and epoch handling.

 On 20 Dec 2024, at 09:43, Štefan Miklošovič 
 wrote:

 
 I find TCM way more comfortable to work with. The capability of log
 being replayed on restart and catching up with everything else
 automatically is god-sent. If we had that on "good old distributed tables",
 then is it not true that we would need to take extra care of that, e.g. we
 would need to repair it etc ... It might be the source of the discrepancies
 / disagreements etc. TCM is just "maintenance-free" and _just works_.

 I

Re: Capabilities

2024-12-20 Thread Štefan Miklošovič

I am super hesitant to base distributed guardrails or any configuration for
that matter on anything but TCM. Does not "C" in TCM stand for
"configuration" anyway? So rename it to TSM like "schema" then if it is
meant to be just for that. It seems to be quite ridiculous to code tables
with caches on top when we have way more effective tooling thanks to CEP-21
to deal with that with clear advantages of getting rid of all of that old
mechanism we have in place.

I have not seen any concrete examples of risks why using TCM should be just
for what it is currently for. Why not put the configuration meant to be
cluster-wide into that?

What is it ... performance? What does even the term "additional complexity"
mean? Complex in what? Do you think that putting there 3 types of
transformations in case of guardrails which flip some booleans and numbers
would suddenly make TCM way more complex? Come on ...

This has nothing to do with what Jordan is trying to introduce. I think we
all agree he knows what he is doing and if he evaluates that TCM is too
much for his use case (or it is not a good fit) that is perfectly fine.

On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:

> > It should be possible to use distributed system tables just fine for
> capabilities, config and guardrails.
>
> I have been thinking about this recently and I agree we should be wary
> about introducing new TCM states and create additional complexity that can
> be serviced by existing data dissemination mechanisms (gossip/system
> tables). I would prefer that we take a more phased and incremental approach
> to introduce new TCM states.
>
> As a way to accomplish that, I have thought about introducing a new
> generic TCM state "In Maintenance", where schema or membership changes are
> "frozen/disallowed" while an external operation is taking place. This
> "external operation" could mean many things:
> - Upgrade
> - Downgrade
> - Migration
> - Capability Enablement/Disablement
>
> These could be sub-states of the "Maintenance" TCM state, that could be
> managed externally (via cache/gossip/system tables/sidecar). Once these
> sub-states are validated thouroughly and mature enough, we could "promote"
> them to top-level TCM states.
>
> In the end what really matters is that cluster and schema membership
> changes do not happen while a miscellaneous operation is taking place.
>
> Would this make sense as an initial way to integrate TCM with capabilities
> framework ?
>
> On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:
>
>> If you perform a read from a distributed table on startup you will find
>> the latest information. What catchup are you thinking of? I don’t think any
>> of the features we talked about need a log, only the latest information.
>>
>> We can (and should) probably introduce event listeners for distributed
>> tables, as this is also a really great feature, but I don’t think this
>> should be necessary here.
>>
>> Regarding disagreements: if you use LWTs then there are no consistency
>> issues to worry about.
>>
>> Again, I’m not opposed to using TCM, although I am a little worried TCM
>> is becoming our new hammer with everything a nail. It would be better IMO
>> to keep TCM scoped to essential functionality as it’s critical to
>> correctness. Perhaps we could extend its APIs to less critical services
>> without intertwining them with membership, schema and epoch handling.
>>
>> On 20 Dec 2024, at 09:43, Štefan Miklošovič 
>> wrote:
>>
>> 
>> I find TCM way more comfortable to work with. The capability of log being
>> replayed on restart and catching up with everything else automatically is
>> god-sent. If we had that on "good old distributed tables", then is it not
>> true that we would need to take extra care of that, e.g. we would need to
>> repair it etc ... It might be the source of the discrepancies /
>> disagreements etc. TCM is just "maintenance-free" and _just works_.
>>
>> I think I was also investigating distributed tables but was just pulled
>> towards TCM naturally because of its goodies.
>>
>> On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:
>>
>>> TCM is a perfectly valid basis for this, but TCM is only really
>>> *necessary* to solve meta config problems where we can’t rely on the rest
>>> of the database working. Particularly placement issues, which is why schema
>>> and membership need to live there.
>>>
>>> It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> That said, it’s possible config might be better represented as part of
>>> the schema (and we already store some relevant config there) in which case
>>> it would live in TCM automatically. Migrating existing configs to a
>>> distributed setup will be fun however we do it though.
>>>
>>> Capabilities also feel naturally related to other membership
>>> information, so TCM might be the most suitable place, particularly for
>>> handling downgrades after capabilities have been enabled (if we ever

Re: Capabilities

2024-12-20 Thread Jon Haddad

I don’t know the details and limits of TCM well enough to comment on what
it can do, but i think its fair to say that if we can’t put a few hundred
configuration options in taking up maybe a few MB, there’s a fundamental
problem with it, and we need to seriously reconsider if it’s ready for
production.

Jon

On Fri, Dec 20, 2024 at 10:47 AM Štefan Miklošovič 
wrote:

> I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super
> reasonable to be put there.
>
> On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič 
> wrote:
>
>> I am super hesitant to base distributed guardrails or any configuration
>> for that matter on anything but TCM. Does not "C" in TCM stand for
>> "configuration" anyway? So rename it to TSM like "schema" then if it is
>> meant to be just for that. It seems to be quite ridiculous to code tables
>> with caches on top when we have way more effective tooling thanks to CEP-21
>> to deal with that with clear advantages of getting rid of all of that old
>> mechanism we have in place.
>>
>> I have not seen any concrete examples of risks why using TCM should be
>> just for what it is currently for. Why not put the configuration meant to
>> be cluster-wide into that?
>>
>> What is it ... performance? What does even the term "additional
>> complexity" mean? Complex in what? Do you think that putting there 3 types
>> of transformations in case of guardrails which flip some booleans and
>> numbers would suddenly make TCM way more complex? Come on ...
>>
>> This has nothing to do with what Jordan is trying to introduce. I think
>> we all agree he knows what he is doing and if he evaluates that TCM is too
>> much for his use case (or it is not a good fit) that is perfectly fine.
>>
>> On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:
>>
>>> > It should be possible to use distributed system tables just fine for
>>> capabilities, config and guardrails.
>>>
>>> I have been thinking about this recently and I agree we should be wary
>>> about introducing new TCM states and create additional complexity that can
>>> be serviced by existing data dissemination mechanisms (gossip/system
>>> tables). I would prefer that we take a more phased and incremental approach
>>> to introduce new TCM states.
>>>
>>> As a way to accomplish that, I have thought about introducing a new
>>> generic TCM state "In Maintenance", where schema or membership changes are
>>> "frozen/disallowed" while an external operation is taking place. This
>>> "external operation" could mean many things:
>>> - Upgrade
>>> - Downgrade
>>> - Migration
>>> - Capability Enablement/Disablement
>>>
>>> These could be sub-states of the "Maintenance" TCM state, that could be
>>> managed externally (via cache/gossip/system tables/sidecar). Once these
>>> sub-states are validated thouroughly and mature enough, we could "promote"
>>> them to top-level TCM states.
>>>
>>> In the end what really matters is that cluster and schema membership
>>> changes do not happen while a miscellaneous operation is taking place.
>>>
>>> Would this make sense as an initial way to integrate TCM with
>>> capabilities framework ?
>>>
>>> On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:
>>>
 If you perform a read from a distributed table on startup you will find
 the latest information. What catchup are you thinking of? I don’t think any
 of the features we talked about need a log, only the latest information.

 We can (and should) probably introduce event listeners for distributed
 tables, as this is also a really great feature, but I don’t think this
 should be necessary here.

 Regarding disagreements: if you use LWTs then there are no consistency
 issues to worry about.

 Again, I’m not opposed to using TCM, although I am a little worried TCM
 is becoming our new hammer with everything a nail. It would be better IMO
 to keep TCM scoped to essential functionality as it’s critical to
 correctness. Perhaps we could extend its APIs to less critical services
 without intertwining them with membership, schema and epoch handling.

 On 20 Dec 2024, at 09:43, Štefan Miklošovič 
 wrote:

 I find TCM way more comfortable to work with. The capability of log
 being replayed on restart and catching up with everything else
 automatically is god-sent. If we had that on "good old distributed tables",
 then is it not true that we would need to take extra care of that, e.g. we
 would need to repair it etc ... It might be the source of the discrepancies
 / disagreements etc. TCM is just "maintenance-free" and _just works_.

 I think I was also investigating distributed tables but was just pulled
 towards TCM naturally because of its goodies.

 On Fri, Dec 20, 2024 at 10:08 AM Benedict  wrote:

> TCM is a perfectly valid basis for this, but TCM is only really
> *necessary* to solve meta config problems where we can’t rely on the rest
>>

If TCM breaks we all have a really bad time, much worse than if any one of these features individually has problems. If you break TCM in the right way the cluster could become inoperable, or operations like topology changes may be prevented. So, we want to keep its responsibilities scoped sensibly, so we minimise the risk of changes to these features.This means that even a parallel log has some risk if we end up modifying shared functionality.On 20 Dec 2024, at 18:47, Štefan Miklošovič wrote:I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there.On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič wrote:I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on top when we have way more effective tooling thanks to CEP-21 to deal with that with clear advantages of getting rid of all of that old mechanism we have in place.I have not seen any concrete examples of risks why using TCM should be just for what it is currently for. Why not put the configuration meant to be cluster-wide into that? What is it ... performance? What does even the term "additional complexity" mean? Complex in what? Do you think that putting there 3 types of transformations in case of guardrails which flip some booleans and numbers would suddenly make TCM way more complex? Come on ...This has nothing to do with what Jordan is trying to introduce. I think we all agree he knows what he is doing and if he evaluates that TCM is too much for his use case (or it is not a good fit) that is perfectly fine. On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta wrote:> It should be possible to use distributed system tables just fine for capabilities, config and guardrails.I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination mechanisms (gossip/system tables). I would prefer that we take a more phased and incremental approach to introduce new TCM states.As a way to accomplish that, I have thought about introducing a new generic TCM state "In Maintenance", where schema or membership changes are "frozen/disallowed" while an external operation is taking place. This "external operation" could mean many things:- Upgrade- Downgrade- Migration- Capability Enablement/DisablementThese could be sub-states of the "Maintenance" TCM state, that could be managed externally (via cache/gossip/system tables/sidecar). Once these sub-states are validated thouroughly and mature enough, we could "promote" them to top-level TCM states.In the end what really matters is that cluster and schema membership changes do not happen while a miscellaneous operation is taking place.Would this make sense as an initial way to integrate TCM with capabilities framework ?On Fri, Dec 20, 2024 at 4:53 AM Benedict wrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend its APIs to less critical services without intertwining them with membership, schema and epoch handling.On 20 Dec 2024, at 09:43, Štefan Miklošovič wrote:I find TCM way more comfortable to work with. The capability of log being replayed on restart and catching up with everything else automatically is god-sent. If we had that on "good old distributed tables", then is it not true that we would need to take extra care of that, e.g. we would need to repair it etc ... It might be the source of the discrepancies / disagreements etc. TCM is just "maintenance-free" and _just works_. I think I was also investigating distributed tables but was just pulled towards TCM naturally because of its goodies.On Fri, Dec 20, 2024 at 10:08 AM Benedict wrote:TCM is a perfectly valid basis for this, but TCM is only really *necessary* to solve meta config problems where we can’t rely on the rest of the database working. Particularly placement issues, which is why schema and membership need to live there.It should be possible to use distributed system tables just fi

Re: Capabilities

2024-12-20 Thread Benedict

Mostly conceptual; the problem with a linearizable history is that if you lose some of it (eg because some logic bug prevents you from processing some epoch) you stop the world until an operator can step in to perform surgery about what the history should be.I do know of one recent bug to schema changes in cep-15 that broke TCM in this way. That particular avenue will be hardened, but the fewer places we risk this the better IMO. Of course, there are steps we could take to expose a limited API targeting these use cases, as well as using a separate log for ancillary functionality, that might better balance risk:reward. But equally I’m not sure it makes sense to TCM all the things, and maybe dogfooding our own database features and developing functionality that enables our own use cases could be better where it isn’t necessary 🤷‍♀️On 20 Dec 2024, at 19:22, Jordan West  wrote:On Fri, Dec 20, 2024 at 11:06 AM Benedict  wrote:If TCM breaks we all have a really bad time, much worse than if any one of these features individually has problems. If you break TCM in the right way the cluster could become inoperable, or operations like topology changes may be prevented. Benedict, when you say this are you speaking hypothetically (in the sense that by using TCM more we increase the probability of using it "wrong" and hitting an unknown edge case) or are there known ways today that TCM "breaks"?  Jordan This means that even a parallel log has some risk if we end up modifying shared functionality.On 20 Dec 2024, at 18:47, Štefan Miklošovič  wrote:I stand corrected. C in TCM is "cluster" :D Anyway. Configuration is super reasonable to be put there.On Fri, Dec 20, 2024 at 7:42 PM Štefan Miklošovič  wrote:I am super hesitant to base distributed guardrails or any configuration for that matter on anything but TCM. Does not "C" in TCM stand for "configuration" anyway? So rename it to TSM like "schema" then if it is meant to be just for that. It seems to be quite ridiculous to code tables with caches on top when we have way more effective tooling thanks to CEP-21 to deal with that with clear advantages of getting rid of all of that old mechanism we have in place.I have not seen any concrete examples of risks why using TCM should be just for what it is currently for. Why not put the configuration meant to be cluster-wide into that? What is it ... performance? What does even the term "additional complexity" mean? Complex in what? Do you think that putting there 3 types of transformations in case of guardrails which flip some booleans and numbers would suddenly make TCM way more complex? Come on ...This has nothing to do with what Jordan is trying to introduce. I think we all agree he knows what he is doing and if he evaluates that TCM is too much for his use case (or it is not a good fit) that is perfectly fine. On Fri, Dec 20, 2024 at 7:22 PM Paulo Motta  wrote:> It should be possible to use distributed system tables just fine for capabilities, config and guardrails.I have been thinking about this recently and I agree we should be wary about introducing new TCM states and create additional complexity that can be serviced by existing data dissemination mechanisms (gossip/system tables). I would prefer that we take a more phased and incremental approach to introduce new TCM states.As a way to accomplish that, I have thought about introducing a new generic TCM state "In Maintenance", where schema or membership changes are "frozen/disallowed" while an external operation is taking place. This "external operation" could mean many things:- Upgrade- Downgrade- Migration- Capability Enablement/DisablementThese could be sub-states of the "Maintenance" TCM state, that could be managed externally (via cache/gossip/system tables/sidecar). Once these sub-states are validated thouroughly and mature enough, we could "promote" them to top-level TCM states.In the end what really matters is that cluster and schema membership changes do not happen while a miscellaneous operation is taking place.Would this make sense as an initial way to integrate TCM with capabilities framework ?On Fri, Dec 20, 2024 at 4:53 AM Benedict  wrote:If you perform a read from a distributed table on startup you will find the latest information. What catchup are you thinking of? I don’t think any of the features we talked about need a log, only the latest information.We can (and should) probably introduce event listeners for distributed tables, as this is also a really great feature, but I don’t think this should be necessary here.Regarding disagreements: if you use LWTs then there are no consistency issues to worry about.Again, I’m not opposed to using TCM, although I am a little worried TCM is becoming our new hammer with everything a nail. It would be better IMO to keep TCM scoped to essential functionality as it’s critical to correctness. Perhaps we could extend

[DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Caleb Rackliffe

Some of your are probably familiar with work in the DS fork to improve the
selection of indexes for SAI queries in
https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
.

While I'm eagerly anticipating working on that in the new year, I'm also
wondering whether we think some simple CQL extensions to manually control
index selection would be helpful. Maxwell proposed this a while back
in CASSANDRA-18112, and I'd like to propose a syntax:


ex. Do not use the specified index during the query.

SELECT ... FROM ... WHERE ... WITHOUT INDEX 

This could be helpful for intersection queries where one of the provided
clauses is not very selective and could simply be handled via
post-filtering.

ex. Require the specified index to be used.

SELECT ... FROM ... WHERE ... WITH INDEX 

This could be helpful in scenarios where multiple indexes exist on a column
and was the primary motivation for CASSANDRA-18112.

Thoughts?

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Re: Capabilities

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

Re: Capabilities

Re: Capabilities

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Capabilities

Re: Capabilities

[DISCUSS] Index selection syntax for CASSANDRA-18112

26 matches

Site Navigation

Mail list logo

Footer information