Hi everybody, CEP-29 (CQL NOT Operator) is hitting the grey area of how we want as a community to handle NULL including for things like unfrozen (multi-cell) collections and I would like to make a proposal for moving forward with NULL related issues.
We have currently 2 tickets open about NULL handling (I might have missed others): 1. CASSANDRA-10715 <https://issues.apache.org/jira/browse/CASSANDRA-10715>: Allowing Filtering on NULL 2. CASSANDRA-17762 <https://issues.apache.org/jira/browse/CASSANDRA-17762>: LWT IF col = NULL is inconsistent with SQL NULL We also had previously some discussion on which we touched the subject: - [DISCUSS] LWT UPDATE semantics with + and - when null - CEP-15 multi key transaction syntax In all those tickets and discussions the consensus was to have a behavior similar to SQL. For null comparisons, SQL uses the three-value logic ( https://modern-sql.com/concept/three-valued-logic) introducing the need for IS NULL and IS NOT NULL operators. Those conflict with the col = NULL predicate supported in LWT conditions (CASSANDRA-17762 <https://issues.apache.org/jira/browse/CASSANDRA-17762>). So far, as Cassandra was only using inclusive operators, comparisons were behaving in an expected way. According to three-valued logic NULL CONTAINS 'foo' should return UNKNOWN and the filtering behavior should exclude everything which is not true.Therefore the row should not be returned as expected. With exclusive operators things are more tricky. NULL NOT CONTAINS 'foo' will also return UNKNOWN causing the row to not be returned which might not match people's expectations. This behavior can be even more confusing once you take into account empty and null collections. NOT CONTAINS on an empty collection will return true while it will return UNKNOWN on a NULL collection. Unfortunately, for unfrozen (multicell) collections we are unable to differentiate between an empty and null collection and therefore always treat empty collections as NULL. For predicates such as map[myKey] != 'foo' when myKey is not present the result can also be surprising as it will end up comparing NULL to 'foo' returning once more UNKNOWN and ignoring the row. In order to respect the SQL three-valued logic and be able to allow the user to fetch all the rows which do not contains a specific value we would need support IS NULL, IS NOT NULL and OR to allow query like: WHERE c IS NULL OR c NOT CONTAINS 'foo' / WHERE m IS NULL OR m[myKey] != foo Supporting the three-valued logic makes sense to me even if some behavior might end up being confusing. In which case we can easily fix CASSANDRA-10715 and deprectate support for col = NULL/col != NULL in LWT. What is people's opinion? Should we go for the three-valued logic everywhere? Should we try something else?