Re: Meaningless emptiness and filtering

2025-02-11 Thread Caleb Rackliffe
The case where allowsEmpty == true AND is meaningless == true is especially
confusing. If I could design this from scratch, I would reject writes and
filtering on EMPTY values for int and the other types where meaningless ==
true. (In other words, if we allow EMPTY, it is meaningful and queryable.
If we don't, it isn't.) That avoids problems that can't have anything other
than an arbitrary solution, like what we do with < and > for EMPTY for int.
When we add IS [NOT] NULL support, that would preferably NOT match EMPTY
values for the types where empty means something, like strings. For
everything else, EMPTY could be equivalent to null and match IS NULL.

The only real way to make SAI compatible with the current behavior is to
add something like a special postings list to its data structures that
corresponds to the rows where the indexed column value is EMPTY.

On Tue, Feb 11, 2025 at 12:21 PM David Capwell  wrote:

> Bringing this discussion to dev@ rather than Slack as we try to figure
> out CASSANDRA-20313 and CASSANDRA-19461.
>
> In the type system, we have 2 different (but related) methods:
>
> AbstractType#allowsEmpty- if the user gives empty
> bytes (new byte[0]) will the type reject it
> AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes,
> should this be handled like null?
>
> In practice, there are 2 cases that matter:
>
> allowsEmpty = true AND is meaningless = false - stuff like text and bytes
> allowsEmpty = true AND is meaningless = true  - many types, example "int"
>
> What this means is that users are able to use empty bytes when writing to
> these types, but this leads to complexity in the filter path, and is
> something we are trying to flesh out the “correct” semantics for SAI.
>
> Simple example:
>
> {code}
>
> @Test
> public void test() throws IOException
> {
> try (Cluster cluster = Cluster.build(1).start())
> {
> init(cluster);
> cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int 
> primary key, v int)"));
> IInvokableInstance node = cluster.get(1);
> for (int i = 0; i < 10; i++)
> node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) 
> VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER);
>
> var qr = node.executeInternalWithResult(withKeyspace("SELECT * FROM 
> %s.tbl WHERE v=? ALLOW FILTERING"), ByteBufferUtil.EMPTY_BYTE_BUFFER);
> StringBuilder sb = new StringBuilder();
> sb.append(qr.names());
> while (qr.hasNext())
> {
> var next = qr.next();
> sb.append('\n').append(next);
> }
> System.out.println(sb);
> }
> }
>
> {code}
>
> “Should” this return 10 rows or 0?  In this case, the type is int, and int
> defines empty as meaningless, which means it should act as a null; yet this
> query returns 10 rows, which violates CQL as foo = null == false.
>
> Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is
> still open), but if we did add such a thing we would also need to figure
> out the semantics with regard to these cases.
>


Re: Meaningless emptiness and filtering

2025-02-11 Thread Jeremiah Jordan
 AFAIK this EMPTY stuff goes back to thrift days.  We let people insert
these zero length values back then, so we have to support those zero length
values existing for ever :/.

How useful is such a distinction?  I don’t know.  Is anybody actually doing
this?  Well Andres brought up
https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because
we had an end user create an SAI index on a column which contained EMPTY
values in it.  So people are inserting these into the database.  Would they
expect to be able to query by EMPTY?  I do not know.

This is the first I have heard of the “isEmptyValueMeaningless” setting.
The meaning of EMPTY to me has always been the same for an Integer or a
String, “this column has a value of no value” vs NULL which means "this
column is not set/has no value”.  If we truly want to follow the spirit of
that setting, then maybe we should be converting such values into a
tombstone / NULL up front when deserializing them, rather than storing the
EMPTY byte buffer in the DB?

Anyway, I am kind of rambling here.  I am of two minds.
I can see that this does seem like a silly distinction to have for some
types, so maybe we should just decide that in a CQL world, EMPTY means NULL
for some types, and actually just make that a tombstone.  Maybe 6.0 would
be a good major version change to make such a “breaking” behavior change in.

I can also see the “don’t screw up the legacy apps” use case.  Everything
besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY
as a distant value which can be inserted and queried on.  We have supported
it in the past, so we should continue to support it into the future, even
if it is painful to do.

Flip a coin and I can argue either side.  So I would love to hear others
thoughts to convince me one way to the other.

-Jeremiah



On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe 
wrote:

> The case where allowsEmpty == true AND is meaningless == true is
> especially confusing. If I could design this from scratch, I would reject
> writes and filtering on EMPTY values for int and the other types where
> meaningless == true. (In other words, if we allow EMPTY, it is meaningful
> and queryable. If we don't, it isn't.) That avoids problems that can't have
> anything other than an arbitrary solution, like what we do with < and > for
> EMPTY for int. When we add IS [NOT] NULL support, that would preferably NOT
> match EMPTY values for the types where empty means something, like strings.
> For everything else, EMPTY could be equivalent to null and match IS NULL.
>
> The only real way to make SAI compatible with the current behavior is to
> add something like a special postings list to its data structures that
> corresponds to the rows where the indexed column value is EMPTY.
>
> On Tue, Feb 11, 2025 at 12:21 PM David Capwell  wrote:
>
>> Bringing this discussion to dev@ rather than Slack as we try to figure
>> out CASSANDRA-20313 and CASSANDRA-19461.
>>
>> In the type system, we have 2 different (but related) methods:
>>
>> AbstractType#allowsEmpty- if the user gives empty
>> bytes (new byte[0]) will the type reject it
>> AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes,
>> should this be handled like null?
>>
>> In practice, there are 2 cases that matter:
>>
>> allowsEmpty = true AND is meaningless = false - stuff like text and bytes
>> allowsEmpty = true AND is meaningless = true  - many types, example "int"
>>
>> What this means is that users are able to use empty bytes when writing to
>> these types, but this leads to complexity in the filter path, and is
>> something we are trying to flesh out the “correct” semantics for SAI.
>>
>> Simple example:
>>
>> {code}
>>
>> @Test
>> public void test() throws IOException
>> {
>> try (Cluster cluster = Cluster.build(1).start())
>> {
>> init(cluster);
>> cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int 
>> primary key, v int)"));
>> IInvokableInstance node = cluster.get(1);
>> for (int i = 0; i < 10; i++)
>> node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) 
>> VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER);
>>
>> var qr = node.executeInternalWithResult(withKeyspace("SELECT * FROM 
>> %s.tbl WHERE v=? ALLOW FILTERING"), ByteBufferUtil.EMPTY_BYTE_BUFFER);
>> StringBuilder sb = new StringBuilder();
>> sb.append(qr.names());
>> while (qr.hasNext())
>> {
>> var next = qr.next();
>> sb.append('\n').append(next);
>> }
>> System.out.println(sb);
>> }
>> }
>>
>> {code}
>>
>> “Should” this return 10 rows or 0?  In this case, the type is int, and
>> int defines empty as meaningless, which means it should act as a null; yet
>> this query returns 10 rows, which violates CQL as foo = null == false.
>>
>> Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is
>> still open), but 

Re: Meaningless emptiness and filtering

2025-02-11 Thread David Capwell
Thanks for the reply!

> AFAIK this EMPTY stuff goes back to thrift days.

This is what I was told, but the expected semantics are not clear so my goal is 
to help flesh things out.

> We let people insert these zero length values back then, so we have to 
> support those zero length values existing for ever :/.

We allow this for some types but not all.  I think where I am coming from is 
write != select, so if we say empty = “no value” or “null” then why does select 
treat it as a value?  Is this the expected behavior?

> maybe we should be converting such values into a tombstone / NULL

Tombstones can be purged, where as empty can’t, so should it?

> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
> treat EMPTY as a distant value which can be inserted and queried on.  We have 
> supported it in the past, so we should continue to support it into the 
> future, even if it is painful to do.

I guess where I come from here is what semantics do we expect.

So lets say v0 is empty bytes int

SELECT CAST(v0 AS text) 

Is this null or empty bytes?  In our project this is null

SELECT JSON v0

Is this null or empty bytes?  In our project this is null

SELECT avg(v0) …

Is this null or empty bytes?  In our project this is null

So in most places you touch empty bytes we treat it as null, but only in 
filtering do we not.

> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
> wrote:
> 
> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert these 
> zero length values back then, so we have to support those zero length values 
> existing for ever :/.
> 
> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
> this?  Well Andres brought up 
> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because we 
> had an end user create an SAI index on a column which contained EMPTY values 
> in it.  So people are inserting these into the database.  Would they expect 
> to be able to query by EMPTY?  I do not know.
> 
> This is the first I have heard of the “isEmptyValueMeaningless” setting.  The 
> meaning of EMPTY to me has always been the same for an Integer or a String, 
> “this column has a value of no value” vs NULL which means "this column is not 
> set/has no value”.  If we truly want to follow the spirit of that setting, 
> then maybe we should be converting such values into a tombstone / NULL up 
> front when deserializing them, rather than storing the EMPTY byte buffer in 
> the DB?
> 
> Anyway, I am kind of rambling here.  I am of two minds.
> I can see that this does seem like a silly distinction to have for some 
> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
> for some types, and actually just make that a tombstone.  Maybe 6.0 would be 
> a good major version change to make such a “breaking” behavior change in.
> 
> I can also see the “don’t screw up the legacy apps” use case.  Everything 
> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY as 
> a distant value which can be inserted and queried on.  We have supported it 
> in the past, so we should continue to support it into the future, even if it 
> is painful to do.
> 
> Flip a coin and I can argue either side.  So I would love to hear others 
> thoughts to convince me one way to the other.
> 
> -Jeremiah
> 
> 
> 
> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  > wrote:
>> The case where allowsEmpty == true AND is meaningless == true is especially 
>> confusing. If I could design this from scratch, I would reject writes and 
>> filtering on EMPTY values for int and the other types where meaningless == 
>> true. (In other words, if we allow EMPTY, it is meaningful and queryable. If 
>> we don't, it isn't.) That avoids problems that can't have anything other 
>> than an arbitrary solution, like what we do with < and > for EMPTY for int. 
>> When we add IS [NOT] NULL support, that would preferably NOT match EMPTY 
>> values for the types where empty means something, like strings. For 
>> everything else, EMPTY could be equivalent to null and match IS NULL.
>> 
>> The only real way to make SAI compatible with the current behavior is to add 
>> something like a special postings list to its data structures that 
>> corresponds to the rows where the indexed column value is EMPTY.
>> 
>> On Tue, Feb 11, 2025 at 12:21 PM David Capwell > > wrote:
>>> Bringing this discussion to dev@ rather than Slack as we try to figure out 
>>> CASSANDRA-20313 and CASSANDRA-19461.
>>> 
>>> In the type system, we have 2 different (but related) methods:
>>> 
>>> AbstractType#allowsEmpty- if the user gives empty 
>>> bytes (new byte[0]) will the type reject it
>>> AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes, 
>>> should this be handled like null?
>>> 
>>> In practice, there are 2 cases that matter:
>>> 
>>> allowsEmpty = true AND is mea

Re: Meaningless emptiness and filtering

2025-02-11 Thread Benedict
Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY modifier for column types, or simply name them eg [int] vs int?I think the problem is today it’s all implicit and - as David says - inconsistent. It would be nice to move away from this as the default for a variety of reasons, but also nice to make the behaviour well defined for those use cases we think we’re supporting.On 11 Feb 2025, at 21:16, David Capwell  wrote:Thanks for the reply!AFAIK this EMPTY stuff goes back to thrift days.This is what I was told, but the expected semantics are not clear so my goal is to help flesh things out.We let people insert these zero length values back then, so we have to support those zero length values existing for ever :/.We allow this for some types but not all.  I think where I am coming from is write != select, so if we say empty = “no value” or “null” then why does select treat it as a value?  Is this the expected behavior?maybe we should be converting such values into a tombstone / NULLTombstones can be purged, where as empty can’t, so should it?Everything besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY as a distant value which can be inserted and queried on.  We have supported it in the past, so we should continue to support it into the future, even if it is painful to do.I guess where I come from here is what semantics do we expect.So lets say v0 is empty bytes intSELECT CAST(v0 AS text) Is this null or empty bytes?  In our project this is nullSELECT JSON v0Is this null or empty bytes?  In our project this is nullSELECT avg(v0) …Is this null or empty bytes?  In our project this is nullSo in most places you touch empty bytes we treat it as null, but only in filtering do we not.On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  wrote:
AFAIK this EMPTY stuff goes back to thrift days.  We let people insert these zero length values back then, so we have to support those zero length values existing for ever :/.How useful is such a distinction?  I don’t know.  Is anybody actually doing this?  Well Andres brought up https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because we had an end user create an SAI index on a column which contained EMPTY values in it.  So people are inserting these into the database.  Would they expect to be able to query by EMPTY?  I do not know.This is the first I have heard of the “isEmptyValueMeaningless” setting.  The meaning of EMPTY to me has always been the same for an Integer or a String, “this column has a value of no value” vs NULL which means "this column is not set/has no value”.  If we truly want to follow the spirit of that setting, then maybe we should be converting such values into a tombstone / NULL up front when deserializing them, rather than storing the EMPTY byte buffer in the DB?Anyway, I am kind of rambling here.  I am of two minds.I can see that this does seem like a silly distinction to have for some types, so maybe we should just decide that in a CQL world, EMPTY means NULL for some types, and actually just make that a tombstone.  Maybe 6.0 would be a good major version change to make such a “breaking” behavior change in.I can also see the “don’t screw up the legacy apps” use case.  Everything besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY as a distant value which can be inserted and queried on.  We have supported it in the past, so we should continue to support it into the future, even if it is painful to do.Flip a coin and I can argue either side.  So I would love to hear others thoughts to convince me one way to the other.-Jeremiah


On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  wrote:

The case where allowsEmpty == true AND is meaningless == true is especially confusing. If I could design this from scratch, I would reject writes and filtering on EMPTY values for int and the other types where meaningless == true. (In other words, if we allow EMPTY, it is meaningful and queryable. If we don't, it isn't.) That avoids problems that can't have anything other than an arbitrary solution, like what we do with < and > for EMPTY for int. When we add IS [NOT] NULL support, that would preferably NOT match EMPTY values for the types where empty means something, like strings. For everything else, EMPTY could be equivalent to null and match IS NULL.The only real way to make SAI compatible with the current behavior is to add something like a special postings list to its data structures that corresponds to the rows where the indexed column value is EMPTY.On Tue, Feb 11, 2025 at 12:21 PM David Capwell  wrote:Bringing this discussion to dev@ rather than Slack as we try to figure out CASSANDRA-20313 and CASSANDRA-19461.In the type system, we have 2 different (but related) methods:AbstractType#allowsEmpty                        - if the user gives empty bytes (new byte[0]) will the type reject itAbstractType#isEmptyValueMeaningle

Re: Meaningless emptiness and filtering

2025-02-11 Thread Paulo Motta
On Tue, Feb 11, 2025 at 5:00 PM Patrick McFadin  wrote:
>
> You get my vote for the best subject line I've seen this week.
>

+1, I'm deeply saddened that despite this title this post does not
contain the answer to the ultimate question of life, the universe, and
everything. :-(


> On Tue, Feb 11, 2025 at 1:20 PM Benedict  wrote:
> >
> > Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY 
> > modifier for column types, or simply name them eg [int] vs int?
> >
> > I think the problem is today it’s all implicit and - as David says - 
> > inconsistent. It would be nice to move away from this as the default for a 
> > variety of reasons, but also nice to make the behaviour well defined for 
> > those use cases we think we’re supporting.
> >
> > On 11 Feb 2025, at 21:16, David Capwell  wrote:
> >
> > Thanks for the reply!
> >
> > AFAIK this EMPTY stuff goes back to thrift days.
> >
> >
> > This is what I was told, but the expected semantics are not clear so my 
> > goal is to help flesh things out.
> >
> > We let people insert these zero length values back then, so we have to 
> > support those zero length values existing for ever :/.
> >
> >
> > We allow this for some types but not all.  I think where I am coming from 
> > is write != select, so if we say empty = “no value” or “null” then why does 
> > select treat it as a value?  Is this the expected behavior?
> >
> > maybe we should be converting such values into a tombstone / NULL
> >
> >
> > Tombstones can be purged, where as empty can’t, so should it?
> >
> > Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
> > treat EMPTY as a distant value which can be inserted and queried on.  We 
> > have supported it in the past, so we should continue to support it into the 
> > future, even if it is painful to do.
> >
> >
> > I guess where I come from here is what semantics do we expect.
> >
> > So lets say v0 is empty bytes int
> >
> > SELECT CAST(v0 AS text)
> >
> > Is this null or empty bytes?  In our project this is null
> >
> > SELECT JSON v0
> >
> > Is this null or empty bytes?  In our project this is null
> >
> > SELECT avg(v0) …
> >
> > Is this null or empty bytes?  In our project this is null
> >
> > So in most places you touch empty bytes we treat it as null, but only in 
> > filtering do we not.
> >
> > On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
> > wrote:
> >
> > AFAIK this EMPTY stuff goes back to thrift days.  We let people insert 
> > these zero length values back then, so we have to support those zero length 
> > values existing for ever :/.
> >
> > How useful is such a distinction?  I don’t know.  Is anybody actually doing 
> > this?  Well Andres brought up 
> > https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because 
> > we had an end user create an SAI index on a column which contained EMPTY 
> > values in it.  So people are inserting these into the database.  Would they 
> > expect to be able to query by EMPTY?  I do not know.
> >
> > This is the first I have heard of the “isEmptyValueMeaningless” setting.  
> > The meaning of EMPTY to me has always been the same for an Integer or a 
> > String, “this column has a value of no value” vs NULL which means "this 
> > column is not set/has no value”.  If we truly want to follow the spirit of 
> > that setting, then maybe we should be converting such values into a 
> > tombstone / NULL up front when deserializing them, rather than storing the 
> > EMPTY byte buffer in the DB?
> >
> > Anyway, I am kind of rambling here.  I am of two minds.
> > I can see that this does seem like a silly distinction to have for some 
> > types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
> > for some types, and actually just make that a tombstone.  Maybe 6.0 would 
> > be a good major version change to make such a “breaking” behavior change in.
> >
> > I can also see the “don’t screw up the legacy apps” use case.  Everything 
> > besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY 
> > as a distant value which can be inserted and queried on.  We have supported 
> > it in the past, so we should continue to support it into the future, even 
> > if it is painful to do.
> >
> > Flip a coin and I can argue either side.  So I would love to hear others 
> > thoughts to convince me one way to the other.
> >
> > -Jeremiah
> >
> >
> >
> > On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  
> > wrote:
> >>
> >> The case where allowsEmpty == true AND is meaningless == true is 
> >> especially confusing. If I could design this from scratch, I would reject 
> >> writes and filtering on EMPTY values for int and the other types where 
> >> meaningless == true. (In other words, if we allow EMPTY, it is meaningful 
> >> and queryable. If we don't, it isn't.) That avoids problems that can't 
> >> have anything other than an arbitrary solution, like what we do with < and 
> >> > for EMPTY for int. When we add IS [NOT] NULL

Re: Meaningless emptiness and filtering

2025-02-11 Thread David Capwell
My planet was destroyed to make way for hyperspatial express route before I 
could get the answer, sorry… anyone interested in getting lunch at the end of 
the universe?

> On Feb 11, 2025, at 2:03 PM, Paulo Motta  wrote:
> 
> On Tue, Feb 11, 2025 at 5:00 PM Patrick McFadin  wrote:
>> 
>> You get my vote for the best subject line I've seen this week.
>> 
> 
> +1, I'm deeply saddened that despite this title this post does not
> contain the answer to the ultimate question of life, the universe, and
> everything. :-(
> 
> 
>> On Tue, Feb 11, 2025 at 1:20 PM Benedict  wrote:
>>> 
>>> Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY 
>>> modifier for column types, or simply name them eg [int] vs int?
>>> 
>>> I think the problem is today it’s all implicit and - as David says - 
>>> inconsistent. It would be nice to move away from this as the default for a 
>>> variety of reasons, but also nice to make the behaviour well defined for 
>>> those use cases we think we’re supporting.
>>> 
>>> On 11 Feb 2025, at 21:16, David Capwell  wrote:
>>> 
>>> Thanks for the reply!
>>> 
>>> AFAIK this EMPTY stuff goes back to thrift days.
>>> 
>>> 
>>> This is what I was told, but the expected semantics are not clear so my 
>>> goal is to help flesh things out.
>>> 
>>> We let people insert these zero length values back then, so we have to 
>>> support those zero length values existing for ever :/.
>>> 
>>> 
>>> We allow this for some types but not all.  I think where I am coming from 
>>> is write != select, so if we say empty = “no value” or “null” then why does 
>>> select treat it as a value?  Is this the expected behavior?
>>> 
>>> maybe we should be converting such values into a tombstone / NULL
>>> 
>>> 
>>> Tombstones can be purged, where as empty can’t, so should it?
>>> 
>>> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
>>> treat EMPTY as a distant value which can be inserted and queried on.  We 
>>> have supported it in the past, so we should continue to support it into the 
>>> future, even if it is painful to do.
>>> 
>>> 
>>> I guess where I come from here is what semantics do we expect.
>>> 
>>> So lets say v0 is empty bytes int
>>> 
>>> SELECT CAST(v0 AS text)
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> SELECT JSON v0
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> SELECT avg(v0) …
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> So in most places you touch empty bytes we treat it as null, but only in 
>>> filtering do we not.
>>> 
>>> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
>>> wrote:
>>> 
>>> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert 
>>> these zero length values back then, so we have to support those zero length 
>>> values existing for ever :/.
>>> 
>>> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
>>> this?  Well Andres brought up 
>>> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because 
>>> we had an end user create an SAI index on a column which contained EMPTY 
>>> values in it.  So people are inserting these into the database.  Would they 
>>> expect to be able to query by EMPTY?  I do not know.
>>> 
>>> This is the first I have heard of the “isEmptyValueMeaningless” setting.  
>>> The meaning of EMPTY to me has always been the same for an Integer or a 
>>> String, “this column has a value of no value” vs NULL which means "this 
>>> column is not set/has no value”.  If we truly want to follow the spirit of 
>>> that setting, then maybe we should be converting such values into a 
>>> tombstone / NULL up front when deserializing them, rather than storing the 
>>> EMPTY byte buffer in the DB?
>>> 
>>> Anyway, I am kind of rambling here.  I am of two minds.
>>> I can see that this does seem like a silly distinction to have for some 
>>> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
>>> for some types, and actually just make that a tombstone.  Maybe 6.0 would 
>>> be a good major version change to make such a “breaking” behavior change in.
>>> 
>>> I can also see the “don’t screw up the legacy apps” use case.  Everything 
>>> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY 
>>> as a distant value which can be inserted and queried on.  We have supported 
>>> it in the past, so we should continue to support it into the future, even 
>>> if it is painful to do.
>>> 
>>> Flip a coin and I can argue either side.  So I would love to hear others 
>>> thoughts to convince me one way to the other.
>>> 
>>> -Jeremiah
>>> 
>>> 
>>> 
>>> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  
>>> wrote:
 
 The case where allowsEmpty == true AND is meaningless == true is 
 especially confusing. If I could design this from scratch, I would reject 
 writes and filtering on EMPTY values for int and the other types where 
 mea

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-11 Thread guo Maxwell
I think it may be better to use LOOSE_NOT_NULL instead of NOT_NULL.
The reason is: NOT_NULL can easily make users think that it is a related
function of MYSQL, but in fact we are different.
Changing a different name may avoid users' preconceived feelings.

Dinesh Joshi  于2025年2月11日周二 01:55写道:

> On Mon, Feb 10, 2025 at 9:05 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> We have consensus then. Let’s ditch the non strict version, and rename
>> the STRICTLY_NOT_NULL to NOT_NULL.
>>
>
> Can you give this thread at least 24-48 hours to ensure we capture any
> other perspectives?
>


[DISCUSS] JVM HEAP and G1 flag improvements for 5.0 (CASSANDRA-20296)

2025-02-11 Thread Mick Semb Wever
Any objections to making the following changes also to 5.0 ?

CASSANDRA-20296 proposes the following changes:

1. G1 to by default use `-XX:G1NewSizePercent=50` to floor the young
generation's size to 50% of the heap. (We know in production this can
often be raised to 66% for optimal performance.)

2. When using G1, default set `-XX:ParallelGCThreads` and
`-XX:ConcGCThreads` to the number of system cpu cores. (Existing
settings are honoured.)

3. The auto-generated heap size is now half the server's physical RAM,
capped at 16G for CMS and 31G for G1. This simplifies, and makes more
appropriate, the previous algorithm.  Like before this is only used if
MAX_HEAP_SIZE or -Xmx hasn't been set.

4. Increase MaxTenuringThreshold from 1 to 2. Plenty of evidence now
showing it has no negatives (over values of zero or one), but can
sometimes have significant benefits in keeping objects in the young
generation. While values above 2 don't have any noticeable benefit.

5. Default set CASSANDRA_HEAPDUMP_DIR to $CASSANDRA_LOG_DIR, to avoid
hprof filling up unexpected disk volumes. Assumption here is that the
logs directory is large enough to handle these dumps, and/or operators
are monitoring these directories more than other randon/unknown
directories.  Existing values of CASSANDRA_HEAPDUMP_DIR are honoured.


As can be seen, these changes are only going to impact those users
that haven't set any of these flags.  And a number of the defaults are
wildly bad.  The only question I have is how this might impact the dev
experience on local machines where the ram is already used up.  (It
would be a fail-fast and the dev would have to set a lower
MAX_HEAP_SIZE, which I think is trivial compared to the benefits
here.)


Re: [DISCUSS] JVM HEAP and G1 flag improvements for 5.0 (CASSANDRA-20296)

2025-02-11 Thread Jon Haddad
+1 to putting it in 5.0


On Tue, Feb 11, 2025 at 2:10 AM Mick Semb Wever  wrote:

> Any objections to making the following changes also to 5.0 ?
>
> CASSANDRA-20296 proposes the following changes:
>
> 1. G1 to by default use `-XX:G1NewSizePercent=50` to floor the young
> generation's size to 50% of the heap. (We know in production this can
> often be raised to 66% for optimal performance.)
>
> 2. When using G1, default set `-XX:ParallelGCThreads` and
> `-XX:ConcGCThreads` to the number of system cpu cores. (Existing
> settings are honoured.)
>
> 3. The auto-generated heap size is now half the server's physical RAM,
> capped at 16G for CMS and 31G for G1. This simplifies, and makes more
> appropriate, the previous algorithm.  Like before this is only used if
> MAX_HEAP_SIZE or -Xmx hasn't been set.
>
> 4. Increase MaxTenuringThreshold from 1 to 2. Plenty of evidence now
> showing it has no negatives (over values of zero or one), but can
> sometimes have significant benefits in keeping objects in the young
> generation. While values above 2 don't have any noticeable benefit.
>
> 5. Default set CASSANDRA_HEAPDUMP_DIR to $CASSANDRA_LOG_DIR, to avoid
> hprof filling up unexpected disk volumes. Assumption here is that the
> logs directory is large enough to handle these dumps, and/or operators
> are monitoring these directories more than other randon/unknown
> directories.  Existing values of CASSANDRA_HEAPDUMP_DIR are honoured.
>
>
> As can be seen, these changes are only going to impact those users
> that haven't set any of these flags.  And a number of the defaults are
> wildly bad.  The only question I have is how this might impact the dev
> experience on local machines where the ram is already used up.  (It
> would be a fail-fast and the dev would have to set a lower
> MAX_HEAP_SIZE, which I think is trivial compared to the benefits
> here.)
>


Re: [VOTE] Release Apache Cassandra Java Driver 4.19.0

2025-02-11 Thread Josh McKenzie
+1

On Mon, Feb 10, 2025, at 6:34 PM, Nate McCall wrote:
> +1
> Verified sigs and artifact coordinates.
> 
> On Tue, Feb 11, 2025 at 12:30 PM Brandon Williams  wrote:
>> +1
>> 
>> Checked sha/sig, maven artifacts, built on j8.
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Thu, Feb 6, 2025 at 4:34 PM Bret McGuire  wrote:
>> >
>> >Greetings all!  I’m proposing the test build of Cassandra Java Driver 
>> > 4.19.0 for release.
>> >
>> >
>> > sha1: 46444eaabdbd23e9231123198536d070e99aca27
>> >
>> > Git: https://github.com/apache/cassandra-java-driver/tree/4.19.0
>> >
>> > Maven 
>> > Artifacts:https://repository.apache.org/content/repositories/orgapachecassandra-1364/
>> >
>> >
>> >The vote will be open for 120 hours (longer if needed). Everyone who 
>> > has tested the build is invited to vote. Votes by PMC members are 
>> > considered binding. A vote passes if there are at least three binding +1s 
>> > and no -1's.
>> >
>> >
>> >Thanks!

CVE-2025-26467: Apache Cassandra: User with MODIFY permission on ALL KEYSPACES can escalate privileges to superuser via unsafe actions (4.0.16 only)

2025-02-11 Thread Paulo Motta
Severity: moderate

Affected versions:

- Apache Cassandra 4.0.16

Description:

Privilege Defined With Unsafe Actions vulnerability in Apache Cassandra. An 
user with MODIFY permission ON ALL KEYSPACES can escalate privileges to 
superuser within a targeted Cassandra cluster via unsafe actions to a system 
resource. Operators granting data MODIFY permission on all keyspaces on 
affected versions should review data access rules for potential breaches.



This issue affects Apache Cassandra 3.0.30, 3.11.17, 4.0.16, 4.1.7, 5.0.2, but 
this advisory is only for 4.0.16 because the fix to CVE-2025-23015 was 
incorrectly applied to 4.0.16, so that version is still affected.

Users in the 4.0 series are recommended to upgrade to version 4.0.17 which 
fixes the issue. Users from 3.0, 3.11, 4.1 and 5.0 series should follow 
recommendation from CVE-2025-23015.

Credit:

Adam Pond of Apple Services Engineering Security (finder)
Ali Mirheidari of Apple Services Engineering Security (finder)
Terry Thibault of Apple Services Engineering Security (finder)
Will Brattain of Apple Services Engineering Security (finder)

References:

https://cassandra.apache.org/
https://www.cve.org/CVERecord?id=CVE-2025-26467



JDK 24 Release Candidate | JavaOne and More Heads-Up

2025-02-11 Thread David Delabassee via dev
Welcome to the first OpenJDK Quality Outreach update of 2025!

The first Release Candidate builds of JDK 24 are now available [1] and tt this 
stage, only P1 issues will be evaluated. With the JDK 24 General Availability 
set for March 18th, the attention is now turning to JDK 25.

JDK 24 will officially launch at JavaOne in Redwood Shores, CA [2]. If you're 
attending or planning to attend JavaOne, please reach out as I’m planning a 
Quality Outreach gathering.

To conclude, make sure to take a look at the heads-up below.

[1] https://jdk.java.net/24/
[2] https://javaone.com/


# Heads-up - JDK 24: Remote Debugging with `jstat` and `jhsdb` is Deprecated 
for Removal

Java's Remote Method Invocation (RMI), introduced in 1997, enables remote 
procedure calls between different JVMs. RMI relies on serialization to encode 
objects into byte streams when sending them as arguments and return values 
between JVMs. Both technologies have long-term security issues and 
configuration challenges, and they haven't withstood the test of time. Today, 
the broader ecosystem has moved away from RMI in favor of more web-friendly 
protocols, and as a result, Java is also gradually reducing and eliminating its 
dependencies on it where possilbe.

Among other tools, Java offers these two tools to connect to a local HotSpot 
JVM and observe or debug it as well as the program it executes:

- `jstat` reads performance counters
- `jhsdb` provides snapshot debugging and analysis features

Both `jstat` and `jhsdb` offer remote capabilities, which are implemented using 
RMI. Due to the aforementioned effort to reduce dependencies on RMI, the remote 
capabilities of `jstat` and `jhsdb` are deprecated for removal in JDK 24:

- JDK-8327793 [3]: `jstatd` allows remote connections with jstat
- JDK-8338894 [4]: `jhsdb debugd` (allows remote connections with `jhsdb`) as 
well as the `--connect` option of the `jhsdb` subcommands `hsdb` and `clhsdb`

Please note that `jstat` and `jhsdb`'s capabilities for local use remain 
available and there are no plans to change that. It should also be mentionned 
that JFR (JDK Flight Recorder) offers a modern alternative for getting remote 
insights into a running HotSpot JVM.

Questions or feedback on these deprecations can be directed at the 
serviceability-dev mailing list [5] (subscription required).

[3] https://bugs.openjdk.org/browse/JDK-8327793
[4] https://bugs.openjdk.org/browse/JDK-8338894
[5] https://mail.openjdk.org/mailman/listinfo/serviceability-dev


# Heads-up - JDK 25: Proposal to Deprecate for Removal 
`-UseCompressedClassPointers`

## Reducing Code and Test Complexity

Shortly after the adoption of 64-bit architectures the 
`-XX:[-|+]UseCompressedClassPointers` and `-XX:[-|+]UseCompressedOops` 
arguments were added to provide Java users the ability to enable using 32-bit 
references even when on a 64-bit architecture. This reduces memory overhead and 
helps reduce cache misses. You can read more about this here [6].

Removing the `-UseCompressedClassPointers` option would make 
`+UseCompressedClassPointers` the default case and reduce the number of 
configurations that would need to be supported from three to two 
(`+UseCompressedClassPointers` and `+UseCompactObjectHeaders`). This would also 
significantly reduce code complexity as well as testing effort. Along with 
this, `-UseCompressedClassPointers` does not work well in a 64-bit architecture 
as can be seen here [7], it’s suspected there are many more examples.

## Minimal Benefit

The `-UseCompressedClassPointers` use rarely provides any tangible benefit to 
Java users. Any historical connection with the `-UseCompresseedOops`flag has 
long since been removed, and the net result of using 
`-UseCompressedClassPointers` is simply increased memory overhead.

## Reasons to Keep `-UseCompressedClassPointers`

There are currently two reasons to continue supporting 
`-UseCompressedClassPointers`:

- `-UseCompressedClassPointers` works well in 32-bit operating systems. However 
support for 32-bit operating systems is on its way out with JEP 479: 'Remove 
the Windows 32-bit x86 Port' [8] and JEP 501: 'Deprecate the 32-bit x86 Port 
for Removal' [9] which are both in forthcoming JDK 24.
- In cases where more than 5 million classes are loaded. However such cases are 
rare, likely the result of programmer error, and would also mean loading likely 
tens of GBs of non-class data into metaspace as well.

For more on this topic, check this thread [10] on the hotspot-dev mailing list.

The engineers working on this are considering marking 
`-UseCompressedClassPointers` as deprecated for removal in JDK 25 and are 
looking for feedback on the impact this could have. Please direct questions and 
feedback to the lilliput-dev [11] mailing list (registration required).

[6] https://stuefe.de/posts/metaspace/what-is-compressed-class-space/
[7] https://github.com/openjdk/jdk/pull/23053
[8] https://openjdk.org/jeps/479
[9] https://openjdk.org/jeps/501
[10]

Meaningless emptiness and filtering

2025-02-11 Thread David Capwell
Bringing this discussion to dev@ rather than Slack as we try to figure out 
CASSANDRA-20313 and CASSANDRA-19461.

In the type system, we have 2 different (but related) methods:

AbstractType#allowsEmpty- if the user gives empty bytes 
(new byte[0]) will the type reject it
AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes, should 
this be handled like null?

In practice, there are 2 cases that matter:

allowsEmpty = true AND is meaningless = false - stuff like text and bytes
allowsEmpty = true AND is meaningless = true  - many types, example "int"

What this means is that users are able to use empty bytes when writing to these 
types, but this leads to complexity in the filter path, and is something we are 
trying to flesh out the “correct” semantics for SAI.

Simple example:

{code}
@Test
public void test() throws IOException
{
try (Cluster cluster = Cluster.build(1).start())
{
init(cluster);
cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int primary 
key, v int)"));
IInvokableInstance node = cluster.get(1);
for (int i = 0; i < 10; i++)
node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) 
VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER);

var qr = node.executeInternalWithResult(withKeyspace("SELECT * FROM 
%s.tbl WHERE v=? ALLOW FILTERING"), ByteBufferUtil.EMPTY_BYTE_BUFFER);
StringBuilder sb = new StringBuilder();
sb.append(qr.names());
while (qr.hasNext())
{
var next = qr.next();
sb.append('\n').append(next);
}
System.out.println(sb);
}
}
{code}

“Should” this return 10 rows or 0?  In this case, the type is int, and int 
defines empty as meaningless, which means it should act as a null; yet this 
query returns 10 rows, which violates CQL as foo = null == false.

Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is still 
open), but if we did add such a thing we would also need to figure out the 
semantics with regard to these cases.

Re: Meaningless emptiness and filtering

2025-02-11 Thread Caleb Rackliffe
If this is only the default in 5.1/6.0/current trunk, so be it, but the minimum 
thing I’d want to build consensus around is no longer allowing empty values in 
filtering/index queries for numeric types. (This applies to actual empty values 
in filtering expressions, but also things like not matching empty values on 
numeric ranges.)

> On Feb 11, 2025, at 6:42 PM, J. D. Jordan  wrote:
> 
> That is the biggest “gotcha” of using the empty value for an int. As soon as 
> you try to use it as an int and not a byte array, all the drivers convert 
> that to a null pointer. If you just “SELECT v0” and then get its value from 
> the result set as a byte array, you get empty bytes, not null.  It is very 
> hard to actually interact with them in the CQL drivers. But it is possible.
> The same being true of CQL functions. You go through “treating it as an int” 
> in them, and lose the ability to have an empty byte array.
> 
>> On Feb 11, 2025, at 4:00 PM, Patrick McFadin  wrote:
>> 
>> You get my vote for the best subject line I've seen this week.
>> 
 On Tue, Feb 11, 2025 at 1:20 PM Benedict  wrote:
>>> 
>>> Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY 
>>> modifier for column types, or simply name them eg [int] vs int?
>>> 
>>> I think the problem is today it’s all implicit and - as David says - 
>>> inconsistent. It would be nice to move away from this as the default for a 
>>> variety of reasons, but also nice to make the behaviour well defined for 
>>> those use cases we think we’re supporting.
>>> 
 On 11 Feb 2025, at 21:16, David Capwell  wrote:
>>> 
>>> Thanks for the reply!
>>> 
>>> AFAIK this EMPTY stuff goes back to thrift days.
>>> 
>>> 
>>> This is what I was told, but the expected semantics are not clear so my 
>>> goal is to help flesh things out.
>>> 
>>> We let people insert these zero length values back then, so we have to 
>>> support those zero length values existing for ever :/.
>>> 
>>> 
>>> We allow this for some types but not all.  I think where I am coming from 
>>> is write != select, so if we say empty = “no value” or “null” then why does 
>>> select treat it as a value?  Is this the expected behavior?
>>> 
>>> maybe we should be converting such values into a tombstone / NULL
>>> 
>>> 
>>> Tombstones can be purged, where as empty can’t, so should it?
>>> 
>>> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
>>> treat EMPTY as a distant value which can be inserted and queried on.  We 
>>> have supported it in the past, so we should continue to support it into the 
>>> future, even if it is painful to do.
>>> 
>>> 
>>> I guess where I come from here is what semantics do we expect.
>>> 
>>> So lets say v0 is empty bytes int
>>> 
>>> SELECT CAST(v0 AS text)
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> SELECT JSON v0
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> SELECT avg(v0) …
>>> 
>>> Is this null or empty bytes?  In our project this is null
>>> 
>>> So in most places you touch empty bytes we treat it as null, but only in 
>>> filtering do we not.
>>> 
 On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
 wrote:
>>> 
>>> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert 
>>> these zero length values back then, so we have to support those zero length 
>>> values existing for ever :/.
>>> 
>>> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
>>> this?  Well Andres brought up 
>>> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because 
>>> we had an end user create an SAI index on a column which contained EMPTY 
>>> values in it.  So people are inserting these into the database.  Would they 
>>> expect to be able to query by EMPTY?  I do not know.
>>> 
>>> This is the first I have heard of the “isEmptyValueMeaningless” setting.  
>>> The meaning of EMPTY to me has always been the same for an Integer or a 
>>> String, “this column has a value of no value” vs NULL which means "this 
>>> column is not set/has no value”.  If we truly want to follow the spirit of 
>>> that setting, then maybe we should be converting such values into a 
>>> tombstone / NULL up front when deserializing them, rather than storing the 
>>> EMPTY byte buffer in the DB?
>>> 
>>> Anyway, I am kind of rambling here.  I am of two minds.
>>> I can see that this does seem like a silly distinction to have for some 
>>> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
>>> for some types, and actually just make that a tombstone.  Maybe 6.0 would 
>>> be a good major version change to make such a “breaking” behavior change in.
>>> 
>>> I can also see the “don’t screw up the legacy apps” use case.  Everything 
>>> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY 
>>> as a distant value which can be inserted and queried on.  We have supported 
>>> it in the past, so we should continue 

Re: Meaningless emptiness and filtering

2025-02-11 Thread Patrick McFadin
You get my vote for the best subject line I've seen this week.

On Tue, Feb 11, 2025 at 1:20 PM Benedict  wrote:
>
> Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY 
> modifier for column types, or simply name them eg [int] vs int?
>
> I think the problem is today it’s all implicit and - as David says - 
> inconsistent. It would be nice to move away from this as the default for a 
> variety of reasons, but also nice to make the behaviour well defined for 
> those use cases we think we’re supporting.
>
> On 11 Feb 2025, at 21:16, David Capwell  wrote:
>
> Thanks for the reply!
>
> AFAIK this EMPTY stuff goes back to thrift days.
>
>
> This is what I was told, but the expected semantics are not clear so my goal 
> is to help flesh things out.
>
> We let people insert these zero length values back then, so we have to 
> support those zero length values existing for ever :/.
>
>
> We allow this for some types but not all.  I think where I am coming from is 
> write != select, so if we say empty = “no value” or “null” then why does 
> select treat it as a value?  Is this the expected behavior?
>
> maybe we should be converting such values into a tombstone / NULL
>
>
> Tombstones can be purged, where as empty can’t, so should it?
>
> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
> treat EMPTY as a distant value which can be inserted and queried on.  We have 
> supported it in the past, so we should continue to support it into the 
> future, even if it is painful to do.
>
>
> I guess where I come from here is what semantics do we expect.
>
> So lets say v0 is empty bytes int
>
> SELECT CAST(v0 AS text)
>
> Is this null or empty bytes?  In our project this is null
>
> SELECT JSON v0
>
> Is this null or empty bytes?  In our project this is null
>
> SELECT avg(v0) …
>
> Is this null or empty bytes?  In our project this is null
>
> So in most places you touch empty bytes we treat it as null, but only in 
> filtering do we not.
>
> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
> wrote:
>
> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert these 
> zero length values back then, so we have to support those zero length values 
> existing for ever :/.
>
> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
> this?  Well Andres brought up 
> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because we 
> had an end user create an SAI index on a column which contained EMPTY values 
> in it.  So people are inserting these into the database.  Would they expect 
> to be able to query by EMPTY?  I do not know.
>
> This is the first I have heard of the “isEmptyValueMeaningless” setting.  The 
> meaning of EMPTY to me has always been the same for an Integer or a String, 
> “this column has a value of no value” vs NULL which means "this column is not 
> set/has no value”.  If we truly want to follow the spirit of that setting, 
> then maybe we should be converting such values into a tombstone / NULL up 
> front when deserializing them, rather than storing the EMPTY byte buffer in 
> the DB?
>
> Anyway, I am kind of rambling here.  I am of two minds.
> I can see that this does seem like a silly distinction to have for some 
> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
> for some types, and actually just make that a tombstone.  Maybe 6.0 would be 
> a good major version change to make such a “breaking” behavior change in.
>
> I can also see the “don’t screw up the legacy apps” use case.  Everything 
> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY as 
> a distant value which can be inserted and queried on.  We have supported it 
> in the past, so we should continue to support it into the future, even if it 
> is painful to do.
>
> Flip a coin and I can argue either side.  So I would love to hear others 
> thoughts to convince me one way to the other.
>
> -Jeremiah
>
>
>
> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  
> wrote:
>>
>> The case where allowsEmpty == true AND is meaningless == true is especially 
>> confusing. If I could design this from scratch, I would reject writes and 
>> filtering on EMPTY values for int and the other types where meaningless == 
>> true. (In other words, if we allow EMPTY, it is meaningful and queryable. If 
>> we don't, it isn't.) That avoids problems that can't have anything other 
>> than an arbitrary solution, like what we do with < and > for EMPTY for int. 
>> When we add IS [NOT] NULL support, that would preferably NOT match EMPTY 
>> values for the types where empty means something, like strings. For 
>> everything else, EMPTY could be equivalent to null and match IS NULL.
>>
>> The only real way to make SAI compatible with the current behavior is to add 
>> something like a special postings list to its data structures that 
>> corresponds to the rows where the indexed column value is EMPTY.
>>
>> On Tue, Feb 1

Re: [VOTE] Release Apache Cassandra Java Driver 4.19.0

2025-02-11 Thread Bret McGuire
With four +1 votes (3 binding) and zero -1 votes the vote passes.
Thanks all!

On Tue, Feb 11, 2025 at 8:02 AM Josh McKenzie  wrote:

> +1
>
> On Mon, Feb 10, 2025, at 6:34 PM, Nate McCall wrote:
>
> +1
> Verified sigs and artifact coordinates.
>
> On Tue, Feb 11, 2025 at 12:30 PM Brandon Williams 
> wrote:
>
> +1
>
> Checked sha/sig, maven artifacts, built on j8.
>
> Kind Regards,
> Brandon
>
> On Thu, Feb 6, 2025 at 4:34 PM Bret McGuire 
> wrote:
> >
> >Greetings all!  I’m proposing the test build of Cassandra Java Driver
> 4.19.0 for release.
> >
> >
> > sha1: 46444eaabdbd23e9231123198536d070e99aca27
> >
> > Git: https://github.com/apache/cassandra-java-driver/tree/4.19.0
> >
> > Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1364/
> >
> >
> >The vote will be open for 120 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
> >
> >
> >Thanks!
>
>


Re: Meaningless emptiness and filtering

2025-02-11 Thread J. D. Jordan
That is the biggest “gotcha” of using the empty value for an int. As soon as 
you try to use it as an int and not a byte array, all the drivers convert that 
to a null pointer. If you just “SELECT v0” and then get its value from the 
result set as a byte array, you get empty bytes, not null.  It is very hard to 
actually interact with them in the CQL drivers. But it is possible.
The same being true of CQL functions. You go through “treating it as an int” in 
them, and lose the ability to have an empty byte array.

> On Feb 11, 2025, at 4:00 PM, Patrick McFadin  wrote:
> 
> You get my vote for the best subject line I've seen this week.
> 
>> On Tue, Feb 11, 2025 at 1:20 PM Benedict  wrote:
>> 
>> Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY 
>> modifier for column types, or simply name them eg [int] vs int?
>> 
>> I think the problem is today it’s all implicit and - as David says - 
>> inconsistent. It would be nice to move away from this as the default for a 
>> variety of reasons, but also nice to make the behaviour well defined for 
>> those use cases we think we’re supporting.
>> 
>> On 11 Feb 2025, at 21:16, David Capwell  wrote:
>> 
>> Thanks for the reply!
>> 
>> AFAIK this EMPTY stuff goes back to thrift days.
>> 
>> 
>> This is what I was told, but the expected semantics are not clear so my goal 
>> is to help flesh things out.
>> 
>> We let people insert these zero length values back then, so we have to 
>> support those zero length values existing for ever :/.
>> 
>> 
>> We allow this for some types but not all.  I think where I am coming from is 
>> write != select, so if we say empty = “no value” or “null” then why does 
>> select treat it as a value?  Is this the expected behavior?
>> 
>> maybe we should be converting such values into a tombstone / NULL
>> 
>> 
>> Tombstones can be purged, where as empty can’t, so should it?
>> 
>> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
>> treat EMPTY as a distant value which can be inserted and queried on.  We 
>> have supported it in the past, so we should continue to support it into the 
>> future, even if it is painful to do.
>> 
>> 
>> I guess where I come from here is what semantics do we expect.
>> 
>> So lets say v0 is empty bytes int
>> 
>> SELECT CAST(v0 AS text)
>> 
>> Is this null or empty bytes?  In our project this is null
>> 
>> SELECT JSON v0
>> 
>> Is this null or empty bytes?  In our project this is null
>> 
>> SELECT avg(v0) …
>> 
>> Is this null or empty bytes?  In our project this is null
>> 
>> So in most places you touch empty bytes we treat it as null, but only in 
>> filtering do we not.
>> 
>> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan  
>> wrote:
>> 
>> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert these 
>> zero length values back then, so we have to support those zero length values 
>> existing for ever :/.
>> 
>> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
>> this?  Well Andres brought up 
>> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because 
>> we had an end user create an SAI index on a column which contained EMPTY 
>> values in it.  So people are inserting these into the database.  Would they 
>> expect to be able to query by EMPTY?  I do not know.
>> 
>> This is the first I have heard of the “isEmptyValueMeaningless” setting.  
>> The meaning of EMPTY to me has always been the same for an Integer or a 
>> String, “this column has a value of no value” vs NULL which means "this 
>> column is not set/has no value”.  If we truly want to follow the spirit of 
>> that setting, then maybe we should be converting such values into a 
>> tombstone / NULL up front when deserializing them, rather than storing the 
>> EMPTY byte buffer in the DB?
>> 
>> Anyway, I am kind of rambling here.  I am of two minds.
>> I can see that this does seem like a silly distinction to have for some 
>> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
>> for some types, and actually just make that a tombstone.  Maybe 6.0 would be 
>> a good major version change to make such a “breaking” behavior change in.
>> 
>> I can also see the “don’t screw up the legacy apps” use case.  Everything 
>> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY 
>> as a distant value which can be inserted and queried on.  We have supported 
>> it in the past, so we should continue to support it into the future, even if 
>> it is painful to do.
>> 
>> Flip a coin and I can argue either side.  So I would love to hear others 
>> thoughts to convince me one way to the other.
>> 
>> -Jeremiah
>> 
>> 
>> 
>>> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe  
>>> wrote:
>>> 
>>> The case where allowsEmpty == true AND is meaningless == true is especially 
>>> confusing. If I could design this from scratch, I would reject writes and 
>>> filtering on EMPTY values for int and the other ty