> On 5 Sep 2022, at 22:02, Henrik Ingo <henrik.i...@datastax.com> wrote: > > Mostly I just wanted to ack that at least someone read the doc (somewhat > superficially sure, but some parts with thought...) >
Thanks, it's a lot to digest, so we appreciate that people are working through it. > One pre-feature that we would include in the preceding minor release is a > node level switch to disable all operations that modify cluster metadata > state. This would include schema changes as well as topology-altering events > like move, decommission or (gossip-based) bootstrap and would be activated on > all nodes for the duration of the major upgrade. If this switch were > accessible via internode messaging, activating it for an upgrade could be > automated. When an upgraded node starts up, it could send a request to > disable metadata changes to any peer still running the old version. This > would cost a few redundant messages, but simplify things operationally. > Although this approach would necessitate an additional minor version upgrade, > this is not without precedent and we believe that the benefits outweigh the > costs of additional operational overhead. > > Sounds like a great idea, and probably necessary in practice? > Although I think we _could_ manage without this, it would certainly simplify this and future upgrades. > If this part of the proposal is accepted, we could also include further > messaging protocol changes in the minor release, as these would largely > constitute additional verbs which would be implemented with no-op verb > handlers initially. This would simplify the major version code, as it would > not need to gate the sending of asynchronous replication messages on the > receiver's release version. During the migration, it may be useful to have a > way to directly inject gossip messages into the cluster, in case the states > of the yet-to-be upgraded nodes become inconsistent. This isn't intended, so > such a tool may never be required, but we have seen that gossip propagation > can be difficult to reason about at times. > > Others will know the code better and I understand that adding new no-op verbs > can be considered safe... But instinctively a bit hesitant on this one. > Surely adding a few if statements to the upgraded version isn't that big of a > deal? > > Also, it should make sense to minimize the dependencies from the previous > major version (without CEP-21) to the new major version (with CEP-21). If a > bug is found, it's much easier to fix code in the new major version than the > old and supposedly stable one. > Yep, agreed. Adding verb handlers in advance may not buy us very much, so may not be worth the risk of additionally perturbing the stable system. I would say that having a means to directly manipulate gossip state during the upgrade would be a useful safety net in case something unforeseen occurs and we need to dig ourselves out of a hole. The precise scope of the feature & required changes are not something we've given extensive thought to yet, so we'd want to assess that carefully before proceeding. > henrik > > -- > Henrik Ingo > +358 40 569 7354 <tel:358405697354> > <https://www.datastax.com/> <https://twitter.com/DataStaxEng> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> > <https://www.linkedin.com/in/heingo/>