Hi Sam, this is a great idea and a really well described CEP! I have some questions, perhaps they reflect my weak understanding, but maybe you can answer: Is it going to work so that each node reads the log individually and try to catch up in a way that it applies a transition locally once the previous change is confirmed on the majority of the affected nodes, right? If so, will it be a replica group explicitly associated with each event (explicitly mentioned nodes which are affected by the change and a list of those which already applied the change, so that each node individually can make a decision whether to move forward?). If so, can the node skip a transformation which does not affect it and move forward thus making another change concurrently?
What if a node(s) failure prevents progress over the log? For example, we are unable to get a majority of nodes which process an event so we cannot move forward. We cannot remove those nodes though, because the removal will be later in the log and we cannot make progress. I've read about manual intervention but maybe it can be avoided in some cases for example by adding no more than one pending event to the log? For multistep actions - are they going to be added all or none? If they are added one by one, can they be interleaved with other multistep actions? Reconfiguration itself occurs using the process that is analogous to > "regular" bootstrap and also uses Paxos as a linearizability mechanism, > except for there is no concept of "token" ownership in CMS; all CMS nodes > own an entire range from MIN to MAX token. This means that during > bootstrap, we do not have to split ranges, or have some nodes "lose" a part > of the ring... This sounds like an implementation of everywhere replication strategy, doesn't it? - - -- --- ----- -------- ------------- Jacek Lewandowski On Tue, Sep 6, 2022 at 9:19 AM Sam Tunnicliffe <s...@beobal.com> wrote: > > > > On 5 Sep 2022, at 22:02, Henrik Ingo <henrik.i...@datastax.com> wrote: > > Mostly I just wanted to ack that at least someone read the doc (somewhat superficially sure, but some parts with thought...) > > > Thanks, it's a lot to digest, so we appreciate that people are working through it. >> >> One pre-feature that we would include in the preceding minor release is a node level switch to disable all operations that modify cluster metadata state. This would include schema changes as well as topology-altering events like move, decommission or (gossip-based) bootstrap and would be activated on all nodes for the duration of the major upgrade. If this switch were accessible via internode messaging, activating it for an upgrade could be automated. When an upgraded node starts up, it could send a request to disable metadata changes to any peer still running the old version. This would cost a few redundant messages, but simplify things operationally. >> >> Although this approach would necessitate an additional minor version upgrade, this is not without precedent and we believe that the benefits outweigh the costs of additional operational overhead. > > > Sounds like a great idea, and probably necessary in practice? > > > > Although I think we _could_ manage without this, it would certainly simplify this and future upgrades. >> >> If this part of the proposal is accepted, we could also include further messaging protocol changes in the minor release, as these would largely constitute additional verbs which would be implemented with no-op verb handlers initially. This would simplify the major version code, as it would not need to gate the sending of asynchronous replication messages on the receiver's release version. During the migration, it may be useful to have a way to directly inject gossip messages into the cluster, in case the states of the yet-to-be upgraded nodes become inconsistent. This isn't intended, so such a tool may never be required, but we have seen that gossip propagation can be difficult to reason about at times. > > > Others will know the code better and I understand that adding new no-op verbs can be considered safe... But instinctively a bit hesitant on this one. Surely adding a few if statements to the upgraded version isn't that big of a deal? > > Also, it should make sense to minimize the dependencies from the previous major version (without CEP-21) to the new major version (with CEP-21). If a bug is found, it's much easier to fix code in the new major version than the old and supposedly stable one. > > > Yep, agreed. Adding verb handlers in advance may not buy us very much, so may not be worth the risk of additionally perturbing the stable system. I would say that having a means to directly manipulate gossip state during the upgrade would be a useful safety net in case something unforeseen occurs and we need to dig ourselves out of a hole. The precise scope of the feature & required changes are not something we've given extensive thought to yet, so we'd want to assess that carefully before proceeding. > > henrik > > -- > Henrik Ingo > +358 40 569 7354 >