Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-07 Thread Caleb Rackliffe
Yeah, what we have with inet is much like if we had a type like "numeric"
that allowed you to write both ints and doubles. If we had actual "inet4"
and "inet6" types, SAI would have been able to index them as fixed length
values without doing the 4 -> 16 byte conversion. Given SAI could easily
change this to go one way or another at post-filtering time, perhaps
there's another option:

4.) Have an option on the column index that allows the user to specify
whether ipv4 and ipv6 addresses are comparable. If they are, nothing
changes. If they aren't, we can just take the matches from the index and
filter "strictly".

I'm not sure what's best here, because what it seems to hinge on is what
users actually want to do when they throw both v4 and v6 addresses into a
single column. Without any real loss in storage efficiency, you could index
them in two separate columns on the same table, and none of this matters.
If they are mixed, it feels like we should at least have the option to make
them comparable, kind of like we have the option to make text
case-insensitive or unicode normalized right now.

On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev 
wrote:

> Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0::7f00:0001 (IPv6),
> but their values are equal. Just like 1.0 (double) is not 1 (int), but
> their values are equal. So, what is the meaning of "=" in CQL?
>
> On 06/03/2024 21:36, David Capwell wrote:
> > So, was reviewing SAI and found we convert ipv4 to ipv6 (which is valid
> for the type) and made me wonder what the behavior would be if client mixed
> ipv4 with ipv4 encoded as ipv6… this caused me to find a different behavior
> in SAI to the rest of C*… where I feel C* is doing the wrong thing…
> >
> > Lets walk over a simple example
> >
> > ipv4: 127.0.0.1
> > ipv6: 0:0:0:0:0::7f00:0001
> >
> > Both of these address are equal according to networking and java… but
> for C* they are different!  These are 2 different values as ipv4 is 4 bytes
> and ipv6 is 16 bytes, so 4 != 16!
> >
> > With SAI we convert all ipv4 to ipv6 so that the search logic is
> correct… this causes SAI to return partitions that ALLOW FILTERING and
> other indexes wouldn’t…
> >
> > This gets to the question in the subject… what SHOULD we do for this
> type?
> >
> > I see 3 options:
> >
> > 1) SAI use the custom C* semantics where 4 != 16… this keeps us
> consistent…
> > 2) ALLOW FILTERING and other indexes are “fixed” so that we actually
> match correctly… we are not really able to fix if the type is in a
> partition or clustering column though…
> > 3) deprecate inet in favor of a inet_better type… where inet semantics
> is the custom C* semantics and inet_better handles this case
> >
> > Thoughts?
>


Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-07 Thread Bowen Song via dev
I think the answer to that is, if an inet type column is a partition 
key, can I write to it in IPv4 and then query it with IPv6 and find the 
record? I believe the behaviour between SAI and partition key should be 
the same.


On 07/03/2024 17:43, Caleb Rackliffe wrote:
Yeah, what we have with inet is much like if we had a type like 
"numeric" that allowed you to write both ints and doubles. If we had 
actual "inet4" and "inet6" types, SAI would have been able to index 
them as fixed length values without doing the 4 -> 16 byte conversion. 
Given SAI could easily change this to go one way or another at 
post-filtering time, perhaps there's another option:


4.) Have an option on the column index that allows the user to specify 
whether ipv4 and ipv6 addresses are comparable. If they are, nothing 
changes. If they aren't, we can just take the matches from the index 
and filter "strictly".


I'm not sure what's best here, because what it seems to hinge on is 
what users actually want to do when they throw both v4 and v6 
addresses into a single column. Without any real loss in storage 
efficiency, you could index them in two separate columns on the same 
table, and none of this matters. If they are mixed, it feels like we 
should at least have the option to make them comparable, kind of like 
we have the option to make text case-insensitive or unicode normalized 
right now.


On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev 
 wrote:


Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0::7f00:0001 (IPv6),
but their values are equal. Just like 1.0 (double) is not 1 (int),
but
their values are equal. So, what is the meaning of "=" in CQL?

On 06/03/2024 21:36, David Capwell wrote:
> So, was reviewing SAI and found we convert ipv4 to ipv6 (which
is valid for the type) and made me wonder what the behavior would
be if client mixed ipv4 with ipv4 encoded as ipv6… this caused me
to find a different behavior in SAI to the rest of C*… where I
feel C* is doing the wrong thing…
>
> Lets walk over a simple example
>
> ipv4: 127.0.0.1
> ipv6: 0:0:0:0:0::7f00:0001
>
> Both of these address are equal according to networking and
java… but for C* they are different!  These are 2 different values
as ipv4 is 4 bytes and ipv6 is 16 bytes, so 4 != 16!
>
> With SAI we convert all ipv4 to ipv6 so that the search logic is
correct… this causes SAI to return partitions that ALLOW FILTERING
and other indexes wouldn’t…
>
> This gets to the question in the subject… what SHOULD we do for
this type?
>
> I see 3 options:
>
> 1) SAI use the custom C* semantics where 4 != 16… this keeps us
consistent…
> 2) ALLOW FILTERING and other indexes are “fixed” so that we
actually match correctly… we are not really able to fix if the
type is in a partition or clustering column though…
> 3) deprecate inet in favor of a inet_better type… where inet
semantics is the custom C* semantics and inet_better handles this case
>
> Thoughts?


Re: [DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-07 Thread Caleb Rackliffe
> if an inet type column is a partition key, can I write to it in IPv4 and
then query it with IPv6 and find the record?

You can't...however...

Especially when the original/existing behavior here was possibly not all
that well-conceived, I think it would at least be a good idea to maintain
an *ability* to query inet columns that is v4/v6 format agnostic. We could
change nothing in the SAI index itself and control this behavior at
post-filtering w/ a few lines of code. (i.e. The default could still be to
compare the raw bytes like we would do w/ a key element.)