Caleb is making great progress on this, and I have been working on CQL fuzz
testing the new grammar to make sure we flesh out cases quickly; one thing we
hit was about mixing conditional and non-conditional updates; will use a
example to better show
BEGIN TRANSACTION
LET a = (SELECT * FROM ….);
IF a IS NOT NULL THEN
UPDATE …;
END IF
INSERT INTO ...
COMMIT TRANSACTION
In this case we have 1 UPDATE tied to the IF condition, and one INSERT that
isn’t… for v1 do we need/want to support this, or is it best for v1 to be
simple and have all updates tied to conditional when present?
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev <[email protected]>
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my
> preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>> The replies got trashed pretty badly in the responses.
>> When you say: "Agree it's better to reuse existing syntax than invent new
>> syntax."
>>
>> Which syntax are you referring to?
>>
>> Patrick
>>
>>
>> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <[email protected]
>> <mailto:[email protected]>> wrote:
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev <[email protected]
>> > <mailto:[email protected]>> [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>>
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say, you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >> ?column? | ?column? | ?column?
>> >> ----------+----------+----------
>> >> f | t |
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we do
>> >>> need a way for the user to say they care whether the row was null or the
>> >>> column. I think an implicit conversion here could be surprising. However
>> >>> we could implement tuple expressions anyway and let the user explicitly
>> >>> declare v as a tuple as Caleb has suggested for the existing proposal as
>> >>> well.
>> >>>
>> >>> Assigning constants or other values not selected from a table would also
>> >>> be a little clunky:
>> >>>
>> >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>> >>> IF v1 > 1 AND v2 > 1 THEN…
>> >>>
>> >>> =>
>> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>> >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>> >>>
>> >>> That said, the proposals are /close/ to identical, it is just slightly
>> >>> more verbose and slightly less flexible.
>> >>>
>> >>> Which one would be most intuitive to users is hard to predict. It might
>> >>> be that Option 6 would be slightly easier, but I’m unsure if there would
>> >>> be a huge difference.
>> >>>
>> >>>
>> >>>> On 13 Aug 2022, at 16:59, Patrick McFadin <[email protected]
>> >>>> <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> I'm really happy to see CEP-15 getting closer to a final
>> >>>> implementation. I'm going to walk through my reasoning for your
>> >>>> proposals wrt trying to explain this to somebody new.
>> >>>>
>> >>>> Looking at all the options, the first thing that comes up for me is
>> >>>> the Cassandra project's complicated relationship with NULL. We have
>> >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
>> >>>> NULL/IS NOT NULL is used in materialized views similarly to
>> >>>> proposals 2,4 and 5.
>> >>>>
>> >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>> >>>> AS SELECT [ (column_list) ]
>> >>>> FROM [keyspace_name.]table_name
>> >>>> [ WHERE column_name IS NOT NULL
>> >>>> [ AND column_name IS NOT NULL ... ] ]
>> >>>> [ AND relation [ AND ... ] ]
>> >>>> PRIMARY KEY ( column_list )
>> >>>> [ WITH [ table_properties ]
>> >>>> [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ]
>> >>>> ;
>> >>>>
>> >>>> Based on that, I believe 1 and 3 would just confuse users, so -1 on
>> >>>> those.
>> >>>>
>> >>>> Trying to explain the difference between row and column operations
>> >>>> with LET, I can't see the difference between a row and column in #2.
>> >>>>
>> >>>> #4 introduces a boolean instead of column names and just adds more
>> >>>> syntax.
>> >>>>
>> >>>> #5 is verbose and, in my opinion, easier to reason when writing a
>> >>>> query. Thinking top down, I need to know if these exact rows and/or
>> >>>> column values exist before changing them, so I'll define them first.
>> >>>> Then I'll iterate over the state I created in my actual changes so I
>> >>>> know I'm changing precisely what I want.
>> >>>>
>> >>>> #5 could use a bit more to be clearer to somebody who doesn't write
>> >>>> CQL queries daily and wouldn't require memorizing subtle
>> >>>> differences. It should be similar to all the other syntax, so
>> >>>> learning a little about CQL will let you move into more without
>> >>>> completely re-learning the new syntax.
>> >>>>
>> >>>> So I propose #6)
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>> columns
>> >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>> SELECT row1, row2
>> >>>> IF row1 IS NULL AND row2.v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> I added the SELECT in the LET just so it's straightforward, you are
>> >>>> reading, and it's just like doing a regular select, but you are
>> >>>> assigning it to a variable.
>> >>>>
>> >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
>> >>>> see why you would need the '.v'vs having the complete variable I
>> >>>> created in the statement above.
>> >>>>
>> >>>> EOL
>> >>>>
>> >>>> Patrick
>> >>>>
>> >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
>> >>>> <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> ...and one more option...
>> >>>>
>> >>>> 5.) Introduce tuple assignments, removing all ambiguity around
>> >>>> row vs. column operations.
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>> columns
>> >>>> LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>> >>>> SELECT row1.v, row2.v
>> >>>> IF row1 IS NULL AND row2.v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
>> >>>> <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> via Benedict, here is a 4th option:
>> >>>>
>> >>>> 4.) Similar to #2, but don't rely on the key element being
>> >>>> NULL.
>> >>>>
>> >>>> If the read returns no result, x effectively becomes NULL.
>> >>>> Otherwise, it remains true/NOT NULL.
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>> SELECT x, row2_v
>> >>>> IF x IS NULL AND row2_v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
>> >>>> <[email protected] <mailto:[email protected]>>
>> >>>> wrote:
>> >>>>
>> >>>> Hello again everyone!
>> >>>>
>> >>>> I've been working on a prototype
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719>> in
>> >>>> CASSANDRA-17719 for a grammar that roughly corresponds to
>> >>>> what we've agreed on in this thread. One thing that isn't
>> >>>> immediately obvious to me is how the LET syntax handles
>> >>>> cases where we want to check for the plain existence of a
>> >>>> row in IF. For example, in this hybrid of the originally
>> >>>> proposed syntax and something more like what we've agreed
>> >>>> on (and the RETURNING just to distinguish between that
>> >>>> and SELECT), this could be pretty straightforward:
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>> >>>> SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>> >>>> RETURNING row1.v, row2.v
>> >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> The NOT EXISTS operator has row1 to work with. One the
>> >>>> other hand, w/ the LET syntax and no naming of reads,
>> >>>> it's not clear what the best solution would be. Here are
>> >>>> a few possibilities:
>> >>>>
>> >>>> 1.) Provide a few built-in functions that operate on a
>> >>>> whole result row. If we assume a SQL style IS NULL and IS
>> >>>> NOT NULL (see my last post here) for operations on
>> >>>> particular columns, this probably eliminates the need for
>> >>>> EXISTS/NOT EXISTS as well.
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1_missing = notExists() FROM ks.tbl WHERE k=0
>> >>>> AND c=0;
>> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>> SELECT row1_missing, row2_v
>> >>>> IF row1_missing AND row2_v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> 2.) Assign and check the first primary key element to
>> >>>> determine whether the row exists.
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>> SELECT row1_k, row2_v
>> >>>> IF row1_k IS NULL AND row2_v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> 3.) Reconsider the LET concept toward something that
>> >>>> allows us to explicitly name our reads again.
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>> >>>> WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>> >>>> SELECT row1.v, row2.v
>> >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> I don't have a strong affinity for any of these, although
>> >>>> #1 seems the most awkward.
>> >>>>
>> >>>> Does anyone have any other alternatives? Preference for
>> >>>> one of the above options?
>> >>>>
>> >>>> Thanks!
>> >>>>
>> >>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
>> >>>> <[email protected]
>> >>>> <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> Avi brought up an interesting point around NULLness
>> >>>> checking inCASSANDRA-17762
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>>...
>> >>>>
>> >>>> In SQL, any comparison with NULL is NULL, which
>> >>>> is interpreted as FALSE in a condition. To test
>> >>>> for NULLness, you use IS NULL or IS NOT NULL. But
>> >>>> LWT uses IF col = NULL as a NULLness test. This
>> >>>> is likely to confuse people coming from SQL and
>> >>>> hamper attempts to extend the dialect.
>> >>>>
>> >>>>
>> >>>> We can leave that Jira open to address what to do in
>> >>>> the legacy LWT case, but I'd support a SQL-congruent
>> >>>> syntax here (IS NULL or IS NOT NULL), where we have
>> >>>> something closer to a blank slate.
>> >>>>
>> >>>> Thoughts?
>> >>>>
>> >>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
>> >>>> <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> The new syntax looks great, and I’m really
>> >>>> excited to see this coming together.
>> >>>>
>> >>>> One piece of feedback on the proposed syntax is
>> >>>> around the use of “=“ as a declaration in
>> >>>> addition to its current use as an equality
>> >>>> operator in a WHERE clause and an assignment
>> >>>> operator in an UPDATE:
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> LET car_miles = miles_driven,
>> >>>> car_is_running = is_running FROM cars WHERE
>> >>>> model=’pinto’
>> >>>> LET user_miles = miles_driven FROM users
>> >>>> WHERE name=’blake’
>> >>>> SELECT something else from some other table
>> >>>> IF NOT car_is_running THEN ABORT
>> >>>> UPDATE users SET miles_driven = user_miles
>> >>>> + 30 WHERE name='blake';
>> >>>> UPDATE cars SET miles_driven = car_miles +
>> >>>> 30 WHERE model='pinto';
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> This is supported in languages like PL/pgSQL, but
>> >>>> in a normal SQL query kind of local declaration
>> >>>> is often expressed as an alias (SELECT col AS
>> >>>> new_col), subquery alias (SELECT col) t, or
>> >>>> common table expression (WITH t AS (SELECT col)).
>> >>>>
>> >>>> Here’s an example of an alternative to the
>> >>>> proposed syntax that I’d find more readable:
>> >>>>
>> >>>> BEGIN TRANSACTION
>> >>>> WITH car_miles, car_is_running AS (SELECT
>> >>>> miles_driven, is_running FROM cars WHERE
>> >>>> model=’pinto’),
>> >>>> user_miles AS (SELECT miles_driven FROM users
>> >>>> WHERE name=’blake’)
>> >>>> IF NOT car_is_running THEN ABORT
>> >>>> UPDATE users SET miles_driven = user_miles
>> >>>> + 30 WHERE name='blake';
>> >>>> UPDATE cars SET miles_driven = car_miles + 30
>> >>>> WHERE model='pinto';
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> There’s also the option of naming the transaction
>> >>>> like a subquery, and supporting LET via AS (this
>> >>>> one I’m less sure about but wanted to propose
>> >>>> anyway):
>> >>>>
>> >>>> BEGIN TRANSACTION t1
>> >>>> SELECT miles_driven AS t1.car_miles,
>> >>>> is_running AS t1.car_is_running FROM cars
>> >>>> WHERE model=’pinto’;
>> >>>> SELECT miles_driven AS t1.user_miles FROM
>> >>>> users WHERE name=’blake’;
>> >>>> IF NOT car_is_running THEN ABORT
>> >>>> UPDATE users SET miles_driven = user_miles
>> >>>> + 30 WHERE name='blake';
>> >>>> UPDATE cars SET miles_driven = car_miles +
>> >>>> 30 WHERE model='pinto';
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> This also has the benefit of resolving ambiguity
>> >>>> in case of naming conflicts with existing (or
>> >>>> future) column names.
>> >>>>
>> >>>> --
>> >>>> Abe
>> >>>>
>>
>