Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Sam Tunnicliffe
CASSANDRA-14582 added support for users to supply an arbitrary value for 
HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
dangerous idea for the unique identifier to be settable in this way. Hint 
delivery is already routed by host id and there have been several JIRAs which 
have called for more fundamental reworking of cluster metadata using permanent 
opaque identifiers rather than IPs to address members (CASSANDRA-11559, 
CASSANDRA-15823, etc). Using host id for anything like that in future would be 
made much more difficult with this capability. 

Aside from the longer term implications, it seems that the feature as currently 
implemented has some issues. There doesn't appear to be any validation that a 
supplied host id isn't already in use by a live node, so it's trivial to 
trigger a collision which can lead to divergent ring views between nodes and 
ultimately in data loss.

Although this landed in trunk almost 11 months ago it hasn't been included in a 
release yet, so I propose we revert it before cutting 4.1 (although, as the 
revert isn't a feature, I guess technically we could do that during the 
freeze). I'm not completely convinced about encoding metadata into host ids, 
but even if that is something we want to do, I don't think it's wise to 
completely remove control over the identifiers from Cassandra itself.  

Thanks, 
Sam

> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova  wrote:
> 
> Hi everyone,
> 
> Kind reminder that 1st May is around the corner. What does this mean? Our 
> code freeze starts on 1st May and my understanding is that only bug fixing 
> can go into the 4.1 branch. 
> If anyone has anything to raise, now is a good time. On my end I saw a few 
> things for this week that we should probably put to completion:
> - CASSANDRA-17571  - I 
> have to close this one, it is in progress; new types in Config is good to be 
> in before the freeze I guess, even if It is not yaml change
> - CASSANDRA-17557  - 
> we need to take care of the parameters so we don't have to deprecate and  
> support anything not actually needed; I think it is probably more or less done
> - CASSANDRA-17379  - 
> adds a new flag around config; I think it is more or less done, depends on 
> final CI and second reviewer maybe needed? 
> - JMX intercept Cassandra exceptions, I think David mentioned a rebase was 
> needed
> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their 
> nodetool getter and setter commands are new to 4.1. They are suitable to be 
> ported to guardrails, and if we do this port in 4.1 we won't need to 
> deprecate that property and nodetool commands in the next release, just one 
> release after their introduction.
> 
> I guess the failing tests we see could be fixed after the freeze but no API 
> changes.
> 
> Thanks everyone for all the hard work. Please don’t hesitate to raise the 
> flag with questions, concerns or any help needed. 
> 
> Best regards,
> Ekaterina



Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Paulo Motta
Fully agree we should add a collision check but I don't understand why this
optional feature is bad/dangerous after we add this ability? Can you
provide an example of a potential issue?

I don't expect this property to be used by most users, except power users
which normally know what they're doing. We have tons of potentially
dangerous knobs and I don't get why this particular one is any different.

Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
escreveu:

> CASSANDRA-14582 added support for users to supply an arbitrary value for
> HOST_ID when booting a new node. IMO it's a pretty bad and potentially
> dangerous idea for the unique identifier to be settable in this way. Hint
> delivery is already routed by host id and there have been several JIRAs
> which have called for more fundamental reworking of cluster metadata using
> permanent opaque identifiers rather than IPs to address members
> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like
> that in future would be made much more difficult with this capability.
>
> Aside from the longer term implications, it seems that the feature as
> currently implemented has some issues. There doesn't appear to be any
> validation that a supplied host id isn't already in use by a live node, so
> it's trivial to trigger a collision which can lead to divergent ring views
> between nodes and ultimately in data loss.
>
> Although this landed in trunk almost 11 months ago it hasn't been included
> in a release yet, so I propose we revert it before cutting 4.1 (although,
> as the revert isn't a feature, I guess technically we could do that during
> the freeze). I'm not completely convinced about encoding metadata into host
> ids, but even if that is something we want to do, I don't think it's wise
> to completely remove control over the identifiers from Cassandra itself.
>
> Thanks,
> Sam
>
> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
> wrote:
>
> Hi everyone,
>
> Kind reminder that 1st May is around the corner. What does this mean? Our
> code freeze starts on 1st May and my understanding is that only bug fixing
> can go into the 4.1 branch.
> If anyone has anything to raise, now is a good time. On my end I saw a few
> things for this week that we should probably put to completion:
> - CASSANDRA-17571  -
> I have to close this one, it is in progress; new types in Config is good to
> be in before the freeze I guess, even if It is not yaml change
> - CASSANDRA-17557  -
> we need to take care of the parameters so we don't have to deprecate and
>  support anything not actually needed; I think it is probably more or less
> done
> - CASSANDRA-17379  -
> adds a new flag around config; I think it is more or less done, depends on
> final CI and second reviewer maybe needed?
> - JMX intercept Cassandra exceptions, I think David mentioned a rebase was
> needed
> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their
> nodetool getter and setter commands are new to 4.1. They are suitable to be
> ported to guardrails, and if we do this port in 4.1 we won't need to
> deprecate that property and nodetool commands in the next release, just one
> release after their introduction.
>
> I guess the failing tests we see could be fixed after the freeze but no
> API changes.
>
> Thanks everyone for all the hard work. Please don’t hesitate to raise the
> flag with questions, concerns or any help needed.
>
> Best regards,
> Ekaterina
>
>
>


Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Paulo Motta
To clarify a bit more, I don't think that ticket adds ability to encode
metadata into host IDs, Cassandra still interprets the host uuid as a
permanent opaque identifier. So I don't get why this can be a potential
problem if we add the necessary host_id collision check.

Users may want to control their node UUIDs for inventory purposes, so this
seems like a valid use case for this feature.

Em qua., 27 de abr. de 2022 às 14:20, Paulo Motta 
escreveu:

> Fully agree we should add a collision check but I don't understand why
> this optional feature is bad/dangerous after we add this ability? Can you
> provide an example of a potential issue?
>
> I don't expect this property to be used by most users, except power users
> which normally know what they're doing. We have tons of potentially
> dangerous knobs and I don't get why this particular one is any different.
>
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
> escreveu:
>
>> CASSANDRA-14582 added support for users to supply an arbitrary value for
>> HOST_ID when booting a new node. IMO it's a pretty bad and potentially
>> dangerous idea for the unique identifier to be settable in this way. Hint
>> delivery is already routed by host id and there have been several JIRAs
>> which have called for more fundamental reworking of cluster metadata using
>> permanent opaque identifiers rather than IPs to address members
>> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like
>> that in future would be made much more difficult with this capability.
>>
>> Aside from the longer term implications, it seems that the feature as
>> currently implemented has some issues. There doesn't appear to be any
>> validation that a supplied host id isn't already in use by a live node, so
>> it's trivial to trigger a collision which can lead to divergent ring views
>> between nodes and ultimately in data loss.
>>
>> Although this landed in trunk almost 11 months ago it hasn't been
>> included in a release yet, so I propose we revert it before cutting 4.1
>> (although, as the revert isn't a feature, I guess technically we could do
>> that during the freeze). I'm not completely convinced about encoding
>> metadata into host ids, but even if that is something we want to do, I
>> don't think it's wise to completely remove control over the identifiers
>> from Cassandra itself.
>>
>> Thanks,
>> Sam
>>
>> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
>> wrote:
>>
>> Hi everyone,
>>
>> Kind reminder that 1st May is around the corner. What does this mean? Our
>> code freeze starts on 1st May and my understanding is that only bug fixing
>> can go into the 4.1 branch.
>> If anyone has anything to raise, now is a good time. On my end I saw a
>> few things for this week that we should probably put to completion:
>> - CASSANDRA-17571  -
>> I have to close this one, it is in progress; new types in Config is good to
>> be in before the freeze I guess, even if It is not yaml change
>> - CASSANDRA-17557  -
>> we need to take care of the parameters so we don't have to deprecate and
>>  support anything not actually needed; I think it is probably more or less
>> done
>> - CASSANDRA-17379  -
>> adds a new flag around config; I think it is more or less done, depends on
>> final CI and second reviewer maybe needed?
>> - JMX intercept Cassandra exceptions, I think David mentioned a rebase
>> was needed
>> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their
>> nodetool getter and setter commands are new to 4.1. They are suitable to be
>> ported to guardrails, and if we do this port in 4.1 we won't need to
>> deprecate that property and nodetool commands in the next release, just one
>> release after their introduction.
>>
>> I guess the failing tests we see could be fixed after the freeze but no
>> API changes.
>>
>> Thanks everyone for all the hard work. Please don’t hesitate to raise the
>> flag with questions, concerns or any help needed.
>>
>> Best regards,
>> Ekaterina
>>
>>
>>


Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Sam Tunnicliffe
Like I mentioned, the possibility of easily introducing divergent views of the 
ring between live nodes is pretty dangerous, e.g. starting a new node with the 
same id as an existing live node will cause a collision. The existing node will 
not add the new node to the ring (although it will remain in gossip). Other 
nodes will remove the existing node from token metadata, but won't mark it 
down. There's no requirement for the new node to have the same tokens as the 
existing one either, so the topology has just completely changed without any 
constraints or movement of existing data. Subsequent reads and writes will be 
directed to different replica sets, depending on which coordinator they land 
on. The ownership of the host id as well as the status of nodes in the token 
metadata of peers will continue to flap if those nodes go down and come back up 
as the resolution of who rightfully owns the host id is decided on startup 
time.   

As for things further down the line, it would be pretty untenable to base any 
new/improved cluster membership or data placement implementations on host id if 
the system isn't in control of assigning those. So even if only a handful of 
power users might actually make use of the feature, its very existence would 
constrain what we can assume/assert about host ids going forward. Given that 
drawback, the fact that this is a very niche feature makes it even less 
compelling.


> On 27 Apr 2022, at 18:20, Paulo Motta  wrote:
> 
> Fully agree we should add a collision check but I don't understand why this 
> optional feature is bad/dangerous after we add this ability? Can you provide 
> an example of a potential issue?
> 
> I don't expect this property to be used by most users, except power users 
> which normally know what they're doing. We have tons of potentially dangerous 
> knobs and I don't get why this particular one is any different.
> 
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe  > escreveu:
> CASSANDRA-14582 added support for users to supply an arbitrary value for 
> HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
> dangerous idea for the unique identifier to be settable in this way. Hint 
> delivery is already routed by host id and there have been several JIRAs which 
> have called for more fundamental reworking of cluster metadata using 
> permanent opaque identifiers rather than IPs to address members 
> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like that 
> in future would be made much more difficult with this capability. 
> 
> Aside from the longer term implications, it seems that the feature as 
> currently implemented has some issues. There doesn't appear to be any 
> validation that a supplied host id isn't already in use by a live node, so 
> it's trivial to trigger a collision which can lead to divergent ring views 
> between nodes and ultimately in data loss.
> 
> Although this landed in trunk almost 11 months ago it hasn't been included in 
> a release yet, so I propose we revert it before cutting 4.1 (although, as the 
> revert isn't a feature, I guess technically we could do that during the 
> freeze). I'm not completely convinced about encoding metadata into host ids, 
> but even if that is something we want to do, I don't think it's wise to 
> completely remove control over the identifiers from Cassandra itself.  
> 
> Thanks, 
> Sam
> 
>> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova > > wrote:
>> 
>> Hi everyone,
>> 
>> Kind reminder that 1st May is around the corner. What does this mean? Our 
>> code freeze starts on 1st May and my understanding is that only bug fixing 
>> can go into the 4.1 branch. 
>> If anyone has anything to raise, now is a good time. On my end I saw a few 
>> things for this week that we should probably put to completion:
>> - CASSANDRA-17571  - 
>> I have to close this one, it is in progress; new types in Config is good to 
>> be in before the freeze I guess, even if It is not yaml change
>> - CASSANDRA-17557  - 
>> we need to take care of the parameters so we don't have to deprecate and  
>> support anything not actually needed; I think it is probably more or less 
>> done
>> - CASSANDRA-17379  - 
>> adds a new flag around config; I think it is more or less done, depends on 
>> final CI and second reviewer maybe needed? 
>> - JMX intercept Cassandra exceptions, I think David mentioned a rebase was 
>> needed
>> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their 
>> nodetool getter and setter commands are new to 4.1. They are suitable to be 
>> ported to guardrails, and if we do this port in 4.1 we won't need to 
>> deprecate that property and nodetool commands in the next release, just one 
>> release after their introduction.
>

Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Paulo Motta
> starting a new node with the same id as an existing live node will cause
a collision

Is this not fixed if we add a simple collision check for existing host id?
We can file a bug request and add this check which should be fairly
straightforward.

>  it would be pretty untenable to base any new/improved cluster membership
or data placement implementations on host id if the system isn't in control
of assigning those.

Do we intend to encode any information on the host UUID in the near future?
If not, I don't see why we can't just keep treating these as permanent
opaque UUIDs, as they always have been.

We can always remove this if we change the host identifier to be something
else in the future.

Em qua., 27 de abr. de 2022 às 14:38, Sam Tunnicliffe 
escreveu:

> Like I mentioned, the possibility of easily introducing divergent views of
> the ring between live nodes is pretty dangerous, e.g. starting a new node
> with the same id as an existing live node will cause a collision. The
> existing node will not add the new node to the ring (although it will
> remain in gossip). Other nodes will remove the existing node from token
> metadata, but won't mark it down. There's no requirement for the new node
> to have the same tokens as the existing one either, so the topology has
> just completely changed without any constraints or movement of existing
> data. Subsequent reads and writes will be directed to different replica
> sets, depending on which coordinator they land on. The ownership of the
> host id as well as the status of nodes in the token metadata of peers will
> continue to flap if those nodes go down and come back up as the resolution
> of who rightfully owns the host id is decided on startup time.
>
> As for things further down the line, it would be pretty untenable to base
> any new/improved cluster membership or data placement implementations on
> host id if the system isn't in control of assigning those. So even if only
> a handful of power users might actually make use of the feature, its very
> existence would constrain what we can assume/assert about host ids going
> forward. Given that drawback, the fact that this is a very niche feature
> makes it even less compelling.
>
>
> On 27 Apr 2022, at 18:20, Paulo Motta  wrote:
>
> Fully agree we should add a collision check but I don't understand why
> this optional feature is bad/dangerous after we add this ability? Can you
> provide an example of a potential issue?
>
> I don't expect this property to be used by most users, except power users
> which normally know what they're doing. We have tons of potentially
> dangerous knobs and I don't get why this particular one is any different.
>
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
> escreveu:
>
>> CASSANDRA-14582 added support for users to supply an arbitrary value for
>> HOST_ID when booting a new node. IMO it's a pretty bad and potentially
>> dangerous idea for the unique identifier to be settable in this way. Hint
>> delivery is already routed by host id and there have been several JIRAs
>> which have called for more fundamental reworking of cluster metadata using
>> permanent opaque identifiers rather than IPs to address members
>> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like
>> that in future would be made much more difficult with this capability.
>>
>> Aside from the longer term implications, it seems that the feature as
>> currently implemented has some issues. There doesn't appear to be any
>> validation that a supplied host id isn't already in use by a live node, so
>> it's trivial to trigger a collision which can lead to divergent ring views
>> between nodes and ultimately in data loss.
>>
>> Although this landed in trunk almost 11 months ago it hasn't been
>> included in a release yet, so I propose we revert it before cutting 4.1
>> (although, as the revert isn't a feature, I guess technically we could do
>> that during the freeze). I'm not completely convinced about encoding
>> metadata into host ids, but even if that is something we want to do, I
>> don't think it's wise to completely remove control over the identifiers
>> from Cassandra itself.
>>
>> Thanks,
>> Sam
>>
>> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
>> wrote:
>>
>> Hi everyone,
>>
>> Kind reminder that 1st May is around the corner. What does this mean? Our
>> code freeze starts on 1st May and my understanding is that only bug fixing
>> can go into the 4.1 branch.
>> If anyone has anything to raise, now is a good time. On my end I saw a
>> few things for this week that we should probably put to completion:
>> - CASSANDRA-17571  -
>> I have to close this one, it is in progress; new types in Config is good to
>> be in before the freeze I guess, even if It is not yaml change
>> - CASSANDRA-17557  -
>> we need to take care of the parameters so we don't have to deprecate an

Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread bened...@apache.org
One reason might be compatibility – this may (I hope _will_) migrate to a 
simple integer of low cardinality in future, which would be a breaking change. 
This identifier will likely be used by Accord for correctness, too, and doing 
something wrong with it could have severe consequences, so at the very least it 
should be hard to access.

We could of course have two different host ids, one for the user to set to 
identify the host in some way for them, and another one for internal usage, but 
I’m not sure that’s a great idea.

From: Paulo Motta 
Date: Wednesday, 27 April 2022 at 18:20
To: Cassandra DEV 
Subject: Re: Code freeze starts 1st May. Anything to be addressed?
Fully agree we should add a collision check but I don't understand why this 
optional feature is bad/dangerous after we add this ability? Can you provide an 
example of a potential issue?
I don't expect this property to be used by most users, except power users which 
normally know what they're doing. We have tons of potentially dangerous knobs 
and I don't get why this particular one is any different.

Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
mailto:s...@beobal.com>> escreveu:
CASSANDRA-14582 added support for users to supply an arbitrary value for 
HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
dangerous idea for the unique identifier to be settable in this way. Hint 
delivery is already routed by host id and there have been several JIRAs which 
have called for more fundamental reworking of cluster metadata using permanent 
opaque identifiers rather than IPs to address members (CASSANDRA-11559, 
CASSANDRA-15823, etc). Using host id for anything like that in future would be 
made much more difficult with this capability.

Aside from the longer term implications, it seems that the feature as currently 
implemented has some issues. There doesn't appear to be any validation that a 
supplied host id isn't already in use by a live node, so it's trivial to 
trigger a collision which can lead to divergent ring views between nodes and 
ultimately in data loss.

Although this landed in trunk almost 11 months ago it hasn't been included in a 
release yet, so I propose we revert it before cutting 4.1 (although, as the 
revert isn't a feature, I guess technically we could do that during the 
freeze). I'm not completely convinced about encoding metadata into host ids, 
but even if that is something we want to do, I don't think it's wise to 
completely remove control over the identifiers from Cassandra itself.

Thanks,
Sam


On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:

Hi everyone,

Kind reminder that 1st May is around the corner. What does this mean? Our code 
freeze starts on 1st May and my understanding is that only bug fixing can go 
into the 4.1 branch.
If anyone has anything to raise, now is a good time. On my end I saw a few 
things for this week that we should probably put to completion:
- CASSANDRA-17571 - I 
have to close this one, it is in progress; new types in Config is good to be in 
before the freeze I guess, even if It is not yaml change
- CASSANDRA-17557 - we 
need to take care of the parameters so we don't have to deprecate and  support 
anything not actually needed; I think it is probably more or less done
- CASSANDRA-17379 - adds 
a new flag around config; I think it is more or less done, depends on final CI 
and second reviewer maybe needed?
- JMX intercept Cassandra exceptions, I think David mentioned a rebase was 
needed
- CASSANDRA-17212 - The config property minimum_keyspace_rf and their nodetool 
getter and setter commands are new to 4.1. They are suitable to be ported to 
guardrails, and if we do this port in 4.1 we won't need to deprecate that 
property and nodetool commands in the next release, just one release after 
their introduction.

I guess the failing tests we see could be fixed after the freeze but no API 
changes.

Thanks everyone for all the hard work. Please don’t hesitate to raise the flag 
with questions, concerns or any help needed.

Best regards,
Ekaterina



Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Paulo Motta
> One reason might be compatibility – this may (I hope _will_) migrate to a
simple integer of low cardinality in future, which would be a breaking
change.

I look forward to this change, but won't we need to implement some backward
compatibility handling for legacy UUIDs anyway? The same backward
compatibility mechanism needed for system-provided UUIDs will work for
user-provided UUIDs.

> This identifier will likely be used by Accord for correctness, too, and
doing something wrong with it could have severe consequences, so at the
very least it should be hard to access.

The only potentially issue I see is a host_id collision, which is easily
fixable by a simple collision check.

> We could of course have two different host ids, one for the user to set
to identify the host in some way for them, and another one for internal
usage, but I’m not sure that’s a great idea.

I don't think we need to keep the ability to set a host ID if we change the
ID representation, since it will be incompatible with externally-provided
UUIDs. We can just remove the feature and call it a day since the new
system will warrant a major version update anyway.

To be clear, I don't oppose reverting this if there are concerns about it.

Em qua., 27 de abr. de 2022 às 14:51, bened...@apache.org <
bened...@apache.org> escreveu:

> One reason might be compatibility – this may (I hope _*will*_) migrate to
> a simple integer of low cardinality in future, which would be a breaking
> change. This identifier will likely be used by Accord for correctness, too,
> and doing something wrong with it could have severe consequences, so at the
> very least it should be hard to access.
>
>
>
> We could of course have two different host ids, one for the user to set to
> identify the host in some way for them, and another one for internal usage,
> but I’m not sure that’s a great idea.
>
>
>
> *From: *Paulo Motta 
> *Date: *Wednesday, 27 April 2022 at 18:20
> *To: *Cassandra DEV 
> *Subject: *Re: Code freeze starts 1st May. Anything to be addressed?
>
> Fully agree we should add a collision check but I don't understand why
> this optional feature is bad/dangerous after we add this ability? Can you
> provide an example of a potential issue?
>
> I don't expect this property to be used by most users, except power users
> which normally know what they're doing. We have tons of potentially
> dangerous knobs and I don't get why this particular one is any different.
>
>
>
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
> escreveu:
>
> CASSANDRA-14582 added support for users to supply an arbitrary value for
> HOST_ID when booting a new node. IMO it's a pretty bad and potentially
> dangerous idea for the unique identifier to be settable in this way. Hint
> delivery is already routed by host id and there have been several JIRAs
> which have called for more fundamental reworking of cluster metadata using
> permanent opaque identifiers rather than IPs to address members
> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like
> that in future would be made much more difficult with this capability.
>
>
>
> Aside from the longer term implications, it seems that the feature as
> currently implemented has some issues. There doesn't appear to be any
> validation that a supplied host id isn't already in use by a live node, so
> it's trivial to trigger a collision which can lead to divergent ring views
> between nodes and ultimately in data loss.
>
>
>
> Although this landed in trunk almost 11 months ago it hasn't been included
> in a release yet, so I propose we revert it before cutting 4.1 (although,
> as the revert isn't a feature, I guess technically we could do that during
> the freeze). I'm not completely convinced about encoding metadata into host
> ids, but even if that is something we want to do, I don't think it's wise
> to completely remove control over the identifiers from Cassandra itself.
>
>
>
> Thanks,
>
> Sam
>
>
>
> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
> wrote:
>
>
>
> Hi everyone,
>
>
>
> Kind reminder that 1st May is around the corner. What does this mean? Our
> code freeze starts on 1st May and my understanding is that only bug fixing
> can go into the 4.1 branch.
>
> If anyone has anything to raise, now is a good time. On my end I saw a few
> things for this week that we should probably put to completion:
>
> - CASSANDRA-17571  -
> I have to close this one, it is in progress; new types in Config is good to
> be in before the freeze I guess, even if It is not yaml change
>
> - CASSANDRA-17557  -
> we need to take care of the parameters so we don't have to deprecate and
>  support anything not actually needed; I think it is probably more or less
> done
>
> - CASSANDRA-17379  -
> adds a new flag around config; I think it is more or less done, depends on
>

Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread bened...@apache.org
> The same backward compatibility mechanism needed for system-provided UUIDs 
> will work for user-provided UUIDs.

By ignoring them, and assigning a different one? That seems confusing, and like 
the feature will in effect be short lived.

It’s a very different problem to upgrade a set of IDs just once that we control 
unilaterally, and another to sensible handle some user input.

I should also note that collision detection is harder than you think. It needs 
to be reliable which means we need to use distributed consensus to allocate 
these ids, it can’t just involve our usual “look in gossip” approach. So 
collision detection by itself is not a small thing to deliver in a few days IMO.

From: Paulo Motta 
Date: Wednesday, 27 April 2022 at 19:09
To: Cassandra DEV 
Subject: Re: Code freeze starts 1st May. Anything to be addressed?
> One reason might be compatibility – this may (I hope _will_) migrate to a 
> simple integer of low cardinality in future, which would be a breaking change.

I look forward to this change, but won't we need to implement some backward 
compatibility handling for legacy UUIDs anyway? The same backward compatibility 
mechanism needed for system-provided UUIDs will work for user-provided UUIDs.

> This identifier will likely be used by Accord for correctness, too, and doing 
> something wrong with it could have severe consequences, so at the very least 
> it should be hard to access.

The only potentially issue I see is a host_id collision, which is easily 
fixable by a simple collision check.

> We could of course have two different host ids, one for the user to set to 
> identify the host in some way for them, and another one for internal usage, 
> but I’m not sure that’s a great idea.

I don't think we need to keep the ability to set a host ID if we change the ID 
representation, since it will be incompatible with externally-provided UUIDs. 
We can just remove the feature and call it a day since the new system will 
warrant a major version update anyway.
To be clear, I don't oppose reverting this if there are concerns about it.

Em qua., 27 de abr. de 2022 às 14:51, 
bened...@apache.org 
mailto:bened...@apache.org>> escreveu:
One reason might be compatibility – this may (I hope _will_) migrate to a 
simple integer of low cardinality in future, which would be a breaking change. 
This identifier will likely be used by Accord for correctness, too, and doing 
something wrong with it could have severe consequences, so at the very least it 
should be hard to access.

We could of course have two different host ids, one for the user to set to 
identify the host in some way for them, and another one for internal usage, but 
I’m not sure that’s a great idea.

From: Paulo Motta mailto:pauloricard...@gmail.com>>
Date: Wednesday, 27 April 2022 at 18:20
To: Cassandra DEV mailto:dev@cassandra.apache.org>>
Subject: Re: Code freeze starts 1st May. Anything to be addressed?
Fully agree we should add a collision check but I don't understand why this 
optional feature is bad/dangerous after we add this ability? Can you provide an 
example of a potential issue?
I don't expect this property to be used by most users, except power users which 
normally know what they're doing. We have tons of potentially dangerous knobs 
and I don't get why this particular one is any different.

Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
mailto:s...@beobal.com>> escreveu:
CASSANDRA-14582 added support for users to supply an arbitrary value for 
HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
dangerous idea for the unique identifier to be settable in this way. Hint 
delivery is already routed by host id and there have been several JIRAs which 
have called for more fundamental reworking of cluster metadata using permanent 
opaque identifiers rather than IPs to address members (CASSANDRA-11559, 
CASSANDRA-15823, etc). Using host id for anything like that in future would be 
made much more difficult with this capability.

Aside from the longer term implications, it seems that the feature as currently 
implemented has some issues. There doesn't appear to be any validation that a 
supplied host id isn't already in use by a live node, so it's trivial to 
trigger a collision which can lead to divergent ring views between nodes and 
ultimately in data loss.

Although this landed in trunk almost 11 months ago it hasn't been included in a 
release yet, so I propose we revert it before cutting 4.1 (although, as the 
revert isn't a feature, I guess technically we could do that during the 
freeze). I'm not completely convinced about encoding metadata into host ids, 
but even if that is something we want to do, I don't think it's wise to 
completely remove control over the identifiers from Cassandra itself.

Thanks,
Sam

On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:

Hi everyone,

Kind reminder that 1st May is around the corner. What

Re: Code freeze starts 1st May. Anything to be addressed?

2022-04-27 Thread Stefan Miklosovic
I will revert it as I committed it, before the freeze.

I have to admit these points you have are all valid, this seems to be
harder than one might think. In this light, as it stands now, it is a
pretty much half-cooked solution doing potentially more harm than
good. The user had a request that "they just want to be comfy when
replacing the node" based on rack / dc that node was in. Indeed, nice
to have, but as it is not bullet-proof enough, well, let's get rid of
it. I think security and robustness is more important than the user's
experience in this.

That ticket was implemented by Abi, our GSOC student, to just "get
into it" and it seemed like an innocent low-hanger to deliver. Well,
not so much.

On Wed, 27 Apr 2022 at 20:22, bened...@apache.org  wrote:
>
> > The same backward compatibility mechanism needed for system-provided UUIDs 
> > will work for user-provided UUIDs.
>
> By ignoring them, and assigning a different one? That seems confusing, and 
> like the feature will in effect be short lived.
>
>
>
> It’s a very different problem to upgrade a set of IDs just once that we 
> control unilaterally, and another to sensible handle some user input.
>
>
>
> I should also note that collision detection is harder than you think. It 
> needs to be reliable which means we need to use distributed consensus to 
> allocate these ids, it can’t just involve our usual “look in gossip” 
> approach. So collision detection by itself is not a small thing to deliver in 
> a few days IMO.
>
>
>
> From: Paulo Motta 
> Date: Wednesday, 27 April 2022 at 19:09
> To: Cassandra DEV 
> Subject: Re: Code freeze starts 1st May. Anything to be addressed?
>
> > One reason might be compatibility – this may (I hope _will_) migrate to a 
> > simple integer of low cardinality in future, which would be a breaking 
> > change.
>
> I look forward to this change, but won't we need to implement some backward 
> compatibility handling for legacy UUIDs anyway? The same backward 
> compatibility mechanism needed for system-provided UUIDs will work for 
> user-provided UUIDs.
>
> > This identifier will likely be used by Accord for correctness, too, and 
> > doing something wrong with it could have severe consequences, so at the 
> > very least it should be hard to access.
>
> The only potentially issue I see is a host_id collision, which is easily 
> fixable by a simple collision check.
>
> > We could of course have two different host ids, one for the user to set to 
> > identify the host in some way for them, and another one for internal usage, 
> > but I’m not sure that’s a great idea.
>
> I don't think we need to keep the ability to set a host ID if we change the 
> ID representation, since it will be incompatible with externally-provided 
> UUIDs. We can just remove the feature and call it a day since the new system 
> will warrant a major version update anyway.
>
> To be clear, I don't oppose reverting this if there are concerns about it.
>
>
>
> Em qua., 27 de abr. de 2022 às 14:51, bened...@apache.org 
>  escreveu:
>
> One reason might be compatibility – this may (I hope _will_) migrate to a 
> simple integer of low cardinality in future, which would be a breaking 
> change. This identifier will likely be used by Accord for correctness, too, 
> and doing something wrong with it could have severe consequences, so at the 
> very least it should be hard to access.
>
>
>
> We could of course have two different host ids, one for the user to set to 
> identify the host in some way for them, and another one for internal usage, 
> but I’m not sure that’s a great idea.
>
>
>
> From: Paulo Motta 
> Date: Wednesday, 27 April 2022 at 18:20
> To: Cassandra DEV 
> Subject: Re: Code freeze starts 1st May. Anything to be addressed?
>
> Fully agree we should add a collision check but I don't understand why this 
> optional feature is bad/dangerous after we add this ability? Can you provide 
> an example of a potential issue?
>
> I don't expect this property to be used by most users, except power users 
> which normally know what they're doing. We have tons of potentially dangerous 
> knobs and I don't get why this particular one is any different.
>
>
>
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe  
> escreveu:
>
> CASSANDRA-14582 added support for users to supply an arbitrary value for 
> HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
> dangerous idea for the unique identifier to be settable in this way. Hint 
> delivery is already routed by host id and there have been several JIRAs which 
> have called for more fundamental reworking of cluster metadata using 
> permanent opaque identifiers rather than IPs to address members 
> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like that 
> in future would be made much more difficult with this capability.
>
>
>
> Aside from the longer term implications, it seems that the feature as 
> currently implemented has some issues. There doesn't appear to be