Re: [DISCUSS] CEP-21: Transactional Cluster Metadata

Jacek Lewandowski Tue, 06 Sep 2022 07:25:34 -0700

Hi Sam, this is a great idea and a really well described CEP!

I have some questions, perhaps they reflect my weak understanding, but
maybe you can answer:
Is it going to work so that each node reads the log individually and try to
catch up in a way that it applies a transition locally once the previous
change is confirmed on the majority of the affected nodes, right? If so,
will it be a replica group explicitly associated with each event
(explicitly mentioned nodes which are affected by the change and a list of
those which already applied the change, so that each node individually can
make a decision whether to move forward?). If so, can the node skip a
transformation which does not affect it and move forward thus making
another change concurrently?



What if a node(s) failure prevents progress over the log? For example, we
are unable to get a majority of nodes which process an event so we cannot
move forward. We cannot remove those nodes though, because the removal will
be later in the log and we cannot make progress. I've read about manual
intervention but maybe it can be avoided in some cases for example by
adding no more than one pending event to the log?

For multistep actions - are they going to be added all or none? If they are
added one by one, can they be interleaved with other multistep actions?

Reconfiguration itself occurs using the process that is analogous to
> "regular" bootstrap and also uses Paxos as a linearizability mechanism,
> except for there is no concept of "token" ownership in CMS; all CMS nodes
> own an entire range from MIN to MAX token. This means that during
> bootstrap, we do not have to split ranges, or have some nodes "lose" a part
> of the ring...


This sounds like an implementation of everywhere replication strategy,
doesn't it?


- - -- --- ----- -------- -------------
Jacek Lewandowski


On Tue, Sep 6, 2022 at 9:19 AM Sam Tunnicliffe <s...@beobal.com> wrote:
>
>
>
> On 5 Sep 2022, at 22:02, Henrik Ingo <henrik.i...@datastax.com> wrote:
>
> Mostly I just wanted to ack that at least someone read the doc (somewhat
superficially sure, but some parts with thought...)
>
>
> Thanks, it's a lot to digest, so we appreciate that people are working
through it.
>>
>> One pre-feature that we would include in the preceding minor release is
a node level switch to disable all operations that modify cluster metadata
state. This would include schema changes as well as topology-altering
events like move, decommission or (gossip-based) bootstrap and would be
activated on all nodes for the duration of the major upgrade. If this
switch were accessible via internode messaging, activating it for an
upgrade could be automated. When an upgraded node starts up, it could send
a request to disable metadata changes to any peer still running the old
version. This would cost a few redundant messages, but simplify things
operationally.
>>
>> Although this approach would necessitate an additional minor version
upgrade, this is not without precedent and we believe that the benefits
outweigh the costs of additional operational overhead.
>
>
> Sounds like a great idea, and probably necessary in practice?
>
>
>
> Although I think we _could_ manage without this, it would certainly
simplify this and future upgrades.
>>
>> If this part of the proposal is accepted, we could also include further
messaging protocol changes in the minor release, as these would largely
constitute additional verbs which would be implemented with no-op verb
handlers initially. This would simplify the major version code, as it would
not need to gate the sending of asynchronous replication messages on the
receiver's release version. During the migration, it may be useful to have
a way to directly inject gossip messages into the cluster, in case the
states of the yet-to-be upgraded nodes become inconsistent. This isn't
intended, so such a tool may never be required, but we have seen that
gossip propagation can be difficult to reason about at times.
>
>
> Others will know the code better and I understand that adding new no-op
verbs can be considered safe... But instinctively a bit hesitant on this
one. Surely adding a few if statements to the upgraded version isn't that
big of a deal?
>
> Also, it should make sense to minimize the dependencies from the previous
major version (without CEP-21) to the new major version (with CEP-21). If a
bug is found, it's much easier to fix code in the new major version than
the old and supposedly stable one.
>
>
> Yep, agreed. Adding verb handlers in advance may not buy us very much, so
may not be worth the risk of additionally perturbing the stable system. I
would say that having a means to directly manipulate gossip state during
the upgrade would be a useful safety net in case something unforeseen
occurs and we need to dig ourselves out of a hole. The precise scope of the
feature & required changes are not something we've given extensive thought
to yet, so we'd want to assess that carefully before proceeding.
>
> henrik
>
> --
> Henrik Ingo
> +358 40 569 7354
>

Re: [DISCUSS] CEP-21: Transactional Cluster Metadata

Reply via email to