Modifying the behaviour for IF clauses is a major breaking change that could
have disastrous effects for customers, that would be very hard to audit
applications for on upgrade, so I think that option is a non-starter.
I would support an effort to introduce a new session mode where we make
ourselves more ANSI-SQL like, and introduce IS NULL as a concept, and use it
consistently (along with any other appropriate changes)
FWIW, I think the invalid request is probably thrown because ALLOW FILTERING
isn’t really expected to be used and a NULL primary key column cannot match
since we don’t have JOIN. Since we now have SAI perhaps we have more reason to
support NULL in WHERE clauses but I think without introducing a new mode, if we
want to support it, we have to treat NULL like we do in IF - even if it’s not
how we want it to work.
> On 24 Mar 2025, at 23:45, David Capwell wrote:
>
> In fuzz testing I have found some differences between `WHERE` and `IF`
> clauses that want to get feedback from the broader community.
>
> If you try to query with a `null` we will reject it
>
> ```
> @Test
> public void test() throws IOException
> {
>try (Cluster cluster = Cluster.build(1).start())
>{
>init(cluster);
>cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl(pk int, ck int,
> v0 int, v1 int, primary key(pk, ck))"));
>var inst = cluster.coordinator(1);
>
>inst.execute(withKeyspace("INSERT INTO %s.tbl (pk, ck, v0) VALUES (?,
> ?, ?)"), ConsistencyLevel.ALL, 0, 0, 0);
>AssertUtils.assertRows(inst.execute(withKeyspace("SELECT * FROM %s.tbl
> WHERE pk=? AND ck=? and v1=? ALLOW FILTERING"), ConsistencyLevel.ALL, 0, 0,
> null),
> rows());
>}
> }
> ```
>
> This fails as follows
>
> ```
> org.apache.cassandra.exceptions.InvalidRequestException: Invalid null value
> for column v1
> ```
>
> But if you do this in the `IF` clause it is accepted
>
> ```
> @Test
> public void test() throws IOException
> {
>try (Cluster cluster = Cluster.build(1).start())
>{
>init(cluster);
>cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl(pk int, ck int,
> v0 int, v1 int, primary key(pk, ck))"));
>var inst = cluster.coordinator(1);
>
>
>inst.execute(withKeyspace("UPDATE %s.tbl SET v1=0 WHERE pk=0 AND ck=0
> IF v0=?"), ConsistencyLevel.QUORUM, new Object[]{null});
>AssertUtils.assertRows(inst.execute(withKeyspace("SELECT * FROM %s.tbl
> WHERE pk=? AND ck=?"), ConsistencyLevel.SERIAL, 0, 0, null),
> rows());
>}
> }
> ```
>
> CAS accepts this and will apply the `UPDATE` (the row doesn't exist, so `null
> = null => true`; this behavior isn't consistent).
>
> Most of the project treats `null` as something that won't ever match, which
> is consistent with other DBs
>
> ```
> sqlite> select * from employees;
> sqlite> insert into employees (id, name, age, department) values (0, "name",
> 42, "cassandra");
> sqlite> insert into employees (id, name, age) values (1, "name2", 42);
> sqlite> select * from employees where department = null;
> sqlite> sqlite> select * from employees where department is null;
>id = 1
> name = name2
> age = 42
> department = NULL
> sqlite>
> ```
>
> ```
> postgres=# select * from employees where department = null;
> id | name | age | department
> +--+-+
> (0 rows)
> postgres=# select * from employees where department is null;
> id | name | age | department
> +---+-+
> 1 | name2 | 42 |
> (1 row)
> ```
>
> So I guess my main question; is this a bug or a feature?