Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?
Yeah, what we have with inet is much like if we had a type like "numeric" that allowed you to write both ints and doubles. If we had actual "inet4" and "inet6" types, SAI would have been able to index them as fixed length values without doing the 4 -> 16 byte conversion. Given SAI could easily change this to go one way or another at post-filtering time, perhaps there's another option: 4.) Have an option on the column index that allows the user to specify whether ipv4 and ipv6 addresses are comparable. If they are, nothing changes. If they aren't, we can just take the matches from the index and filter "strictly". I'm not sure what's best here, because what it seems to hinge on is what users actually want to do when they throw both v4 and v6 addresses into a single column. Without any real loss in storage efficiency, you could index them in two separate columns on the same table, and none of this matters. If they are mixed, it feels like we should at least have the option to make them comparable, kind of like we have the option to make text case-insensitive or unicode normalized right now. On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev wrote: > Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0::7f00:0001 (IPv6), > but their values are equal. Just like 1.0 (double) is not 1 (int), but > their values are equal. So, what is the meaning of "=" in CQL? > > On 06/03/2024 21:36, David Capwell wrote: > > So, was reviewing SAI and found we convert ipv4 to ipv6 (which is valid > for the type) and made me wonder what the behavior would be if client mixed > ipv4 with ipv4 encoded as ipv6… this caused me to find a different behavior > in SAI to the rest of C*… where I feel C* is doing the wrong thing… > > > > Lets walk over a simple example > > > > ipv4: 127.0.0.1 > > ipv6: 0:0:0:0:0::7f00:0001 > > > > Both of these address are equal according to networking and java… but > for C* they are different! These are 2 different values as ipv4 is 4 bytes > and ipv6 is 16 bytes, so 4 != 16! > > > > With SAI we convert all ipv4 to ipv6 so that the search logic is > correct… this causes SAI to return partitions that ALLOW FILTERING and > other indexes wouldn’t… > > > > This gets to the question in the subject… what SHOULD we do for this > type? > > > > I see 3 options: > > > > 1) SAI use the custom C* semantics where 4 != 16… this keeps us > consistent… > > 2) ALLOW FILTERING and other indexes are “fixed” so that we actually > match correctly… we are not really able to fix if the type is in a > partition or clustering column though… > > 3) deprecate inet in favor of a inet_better type… where inet semantics > is the custom C* semantics and inet_better handles this case > > > > Thoughts? >
Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?
I think the answer to that is, if an inet type column is a partition key, can I write to it in IPv4 and then query it with IPv6 and find the record? I believe the behaviour between SAI and partition key should be the same. On 07/03/2024 17:43, Caleb Rackliffe wrote: Yeah, what we have with inet is much like if we had a type like "numeric" that allowed you to write both ints and doubles. If we had actual "inet4" and "inet6" types, SAI would have been able to index them as fixed length values without doing the 4 -> 16 byte conversion. Given SAI could easily change this to go one way or another at post-filtering time, perhaps there's another option: 4.) Have an option on the column index that allows the user to specify whether ipv4 and ipv6 addresses are comparable. If they are, nothing changes. If they aren't, we can just take the matches from the index and filter "strictly". I'm not sure what's best here, because what it seems to hinge on is what users actually want to do when they throw both v4 and v6 addresses into a single column. Without any real loss in storage efficiency, you could index them in two separate columns on the same table, and none of this matters. If they are mixed, it feels like we should at least have the option to make them comparable, kind of like we have the option to make text case-insensitive or unicode normalized right now. On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev wrote: Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0::7f00:0001 (IPv6), but their values are equal. Just like 1.0 (double) is not 1 (int), but their values are equal. So, what is the meaning of "=" in CQL? On 06/03/2024 21:36, David Capwell wrote: > So, was reviewing SAI and found we convert ipv4 to ipv6 (which is valid for the type) and made me wonder what the behavior would be if client mixed ipv4 with ipv4 encoded as ipv6… this caused me to find a different behavior in SAI to the rest of C*… where I feel C* is doing the wrong thing… > > Lets walk over a simple example > > ipv4: 127.0.0.1 > ipv6: 0:0:0:0:0::7f00:0001 > > Both of these address are equal according to networking and java… but for C* they are different! These are 2 different values as ipv4 is 4 bytes and ipv6 is 16 bytes, so 4 != 16! > > With SAI we convert all ipv4 to ipv6 so that the search logic is correct… this causes SAI to return partitions that ALLOW FILTERING and other indexes wouldn’t… > > This gets to the question in the subject… what SHOULD we do for this type? > > I see 3 options: > > 1) SAI use the custom C* semantics where 4 != 16… this keeps us consistent… > 2) ALLOW FILTERING and other indexes are “fixed” so that we actually match correctly… we are not really able to fix if the type is in a partition or clustering column though… > 3) deprecate inet in favor of a inet_better type… where inet semantics is the custom C* semantics and inet_better handles this case > > Thoughts?
Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?
> if an inet type column is a partition key, can I write to it in IPv4 and then query it with IPv6 and find the record? You can't...however... Especially when the original/existing behavior here was possibly not all that well-conceived, I think it would at least be a good idea to maintain an *ability* to query inet columns that is v4/v6 format agnostic. We could change nothing in the SAI index itself and control this behavior at post-filtering w/ a few lines of code. (i.e. The default could still be to compare the raw bytes like we would do w/ a key element.)