Re: CEP-15 multi key transaction syntax

David Capwell Wed, 21 Sep 2022 12:42:35 -0700

Caleb is making great progress on this, and I have been working on CQL fuzz 
testing the new grammar to make sure we flesh out cases quickly; one thing we 
hit was about mixing conditional and non-conditional updates; will use a 
example to better show


BEGIN TRANSACTION
  LET a = (SELECT * FROM ….);
  IF a IS NOT NULL THEN
    UPDATE …;
  END IF
  INSERT INTO ...
COMMIT TRANSACTION

In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
simple and have all updates tied to conditional when present?

> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev <[email protected]> 
> wrote:
> 
> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
> (or better, the standard) has a way to select data into a variable, let's 
> adopt it.
> 
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my 
> preference.
> 
> On 8/22/22 19:13, Patrick McFadin wrote:
>> The replies got trashed pretty badly in the responses. 
>> When you say: "Agree it's better to reuse existing syntax than invent new 
>> syntax."
>> 
>> Which syntax are you referring to?
>> 
>> Patrick
>> 
>> 
>> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Agree it's better to reuse existing syntax than invent new syntax.
>> 
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev <[email protected] 
>> > <mailto:[email protected]>> [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say, you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> ----------+----------+----------
>> >>   f        | t        |
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we do
>> >>> need a way for the user to say they care whether the row was null or the
>> >>> column. I think an implicit conversion here could be surprising. However
>> >>> we could implement tuple expressions anyway and let the user explicitly
>> >>> declare v as a tuple as Caleb has suggested for the existing proposal as
>> >>> well.
>> >>>
>> >>> Assigning constants or other values not selected from a table would also
>> >>> be a little clunky:
>> >>>
>> >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>> >>> IF v1 > 1 AND v2 > 1 THEN…
>> >>>
>> >>> =>
>> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>> >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>> >>>
>> >>> That said, the proposals are /close/ to identical, it is just slightly
>> >>> more verbose and slightly less flexible.
>> >>>
>> >>> Which one would be most intuitive to users is hard to predict. It might
>> >>> be that Option 6 would be slightly easier, but I’m unsure if there would
>> >>> be a huge difference.
>> >>>
>> >>>
>> >>>> On 13 Aug 2022, at 16:59, Patrick McFadin <[email protected] 
>> >>>> <mailto:[email protected]>> wrote:
>> >>>>
>> >>>> I'm really happy to see CEP-15 getting closer to a final
>> >>>> implementation. I'm going to walk through my reasoning for your
>> >>>> proposals wrt trying to explain this to somebody new.
>> >>>>
>> >>>> Looking at all the options, the first thing that comes up for me is
>> >>>> the Cassandra project's complicated relationship with NULL.  We have
>> >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
>> >>>> NULL/IS NOT NULL is used in materialized views similarly to
>> >>>> proposals 2,4 and 5.
>> >>>>
>> >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>> >>>>    AS SELECT [ (column_list) ]
>> >>>>    FROM [keyspace_name.]table_name
>> >>>>    [ WHERE column_name IS NOT NULL
>> >>>>    [ AND column_name IS NOT NULL ... ] ]
>> >>>>    [ AND relation [ AND ... ] ]
>> >>>>    PRIMARY KEY ( column_list )
>> >>>>    [ WITH [ table_properties ]
>> >>>>    [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] 
>> >>>> ;
>> >>>>
>> >>>>   Based on that, I believe 1 and 3 would just confuse users, so -1 on
>> >>>> those.
>> >>>>
>> >>>> Trying to explain the difference between row and column operations
>> >>>> with LET, I can't see the difference between a row and column in #2.
>> >>>>
>> >>>> #4 introduces a boolean instead of column names and just adds more
>> >>>> syntax.
>> >>>>
>> >>>> #5 is verbose and, in my opinion, easier to reason when writing a
>> >>>> query. Thinking top down, I need to know if these exact rows and/or
>> >>>> column values exist before changing them, so I'll define them first.
>> >>>> Then I'll iterate over the state I created in my actual changes so I
>> >>>> know I'm changing precisely what I want.
>> >>>>
>> >>>> #5 could use a bit more to be clearer to somebody who doesn't write
>> >>>> CQL queries daily and wouldn't require memorizing subtle
>> >>>> differences. It should be similar to all the other syntax, so
>> >>>> learning a little about CQL will let you move into more without
>> >>>> completely re-learning the new syntax.
>> >>>>
>> >>>> So I propose #6)
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>> columns
>> >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>    SELECT row1, row2
>> >>>> IF row1 IS NULL AND row2.v = 3 THEN
>> >>>>    INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> I added the SELECT in the LET just so it's straightforward, you are
>> >>>> reading, and it's just like doing a regular select, but you are
>> >>>> assigning it to a variable.
>> >>>>
>> >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
>> >>>> see why you would need the '.v'vs having the complete variable I
>> >>>> created in the statement above.
>> >>>>
>> >>>> EOL
>> >>>>
>> >>>> Patrick
>> >>>>
>> >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
>> >>>> <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>>      ...and one more option...
>> >>>>
>> >>>>      5.) Introduce tuple assignments, removing all ambiguity around
>> >>>>      row vs. column operations.
>> >>>>
>> >>>>      BEGIN TRANSACTION
>> >>>>        LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>>      columns
>> >>>>        LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>      SELECT row1.v, row2.v
>> >>>>        IF row1 IS NULL AND row2.v = 3 THEN
>> >>>>          INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>        END IF
>> >>>>      COMMIT TRANSACTION
>> >>>>
>> >>>>
>> >>>>
>> >>>>      On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
>> >>>>      <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>>          via Benedict, here is a 4th option:
>> >>>>
>> >>>>          4.) Similar to #2, but don't rely on the key element being 
>> >>>> NULL.
>> >>>>
>> >>>>          If the read returns no result, x effectively becomes NULL.
>> >>>>          Otherwise, it remains true/NOT NULL.
>> >>>>
>> >>>>          BEGIN TRANSACTION
>> >>>>            LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>> >>>>            LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>            SELECT x, row2_v
>> >>>>            IF x IS NULL AND row2_v = 3 THEN
>> >>>>              INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>            END IF
>> >>>>          COMMIT TRANSACTION
>> >>>>
>> >>>>          On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
>> >>>>          <[email protected] <mailto:[email protected]>> 
>> >>>> wrote:
>> >>>>
>> >>>>              Hello again everyone!
>> >>>>
>> >>>>              I've been working on a prototype
>> >>>>              <https://issues.apache.org/jira/browse/CASSANDRA-17719 
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719>> in
>> >>>>              CASSANDRA-17719 for a grammar that roughly corresponds to
>> >>>>              what we've agreed on in this thread. One thing that isn't
>> >>>>              immediately obvious to me is how the LET syntax handles
>> >>>>              cases where we want to check for the plain existence of a
>> >>>>              row in IF. For example, in this hybrid of the originally
>> >>>>              proposed syntax and something more like what we've agreed
>> >>>>              on (and the RETURNING just to distinguish between that
>> >>>>              and SELECT), this could be pretty straightforward:
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>> >>>>                SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>> >>>>                RETURNING row1.v, row2.v
>> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              The NOT EXISTS operator has row1 to work with. One the
>> >>>>              other hand, w/ the LET syntax and no naming of reads,
>> >>>>              it's not clear what the best solution would be. Here are
>> >>>>              a few possibilities:
>> >>>>
>> >>>>              1.) Provide a few built-in functions that operate on a
>> >>>>              whole result row. If we assume a SQL style IS NULL and IS
>> >>>>              NOT NULL (see my last post here) for operations on
>> >>>>              particular columns, this probably eliminates the need for
>> >>>>              EXISTS/NOT EXISTS as well.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                LET row1_missing = notExists() FROM ks.tbl WHERE k=0
>> >>>>              AND c=0;
>> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>                SELECT row1_missing, row2_v
>> >>>>                IF row1_missing AND row2_v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              2.) Assign and check the first primary key element to
>> >>>>              determine whether the row exists.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>                SELECT row1_k, row2_v
>> >>>>                IF row1_k IS NULL AND row2_v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              3.) Reconsider the LET concept toward something that
>> >>>>              allows us to explicitly name our reads again.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>> >>>>                SELECT row1.v, row2.v
>> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              I don't have a strong affinity for any of these, although
>> >>>>              #1 seems the most awkward.
>> >>>>
>> >>>>              Does anyone have any other alternatives? Preference for
>> >>>>              one of the above options?
>> >>>>
>> >>>>              Thanks!
>> >>>>
>> >>>>              On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
>> >>>>              <[email protected] 
>> >>>> <mailto:[email protected]>> wrote:
>> >>>>
>> >>>>                  Avi brought up an interesting point around NULLness
>> >>>>                  checking inCASSANDRA-17762
>> >>>>                  <https://issues.apache.org/jira/browse/CASSANDRA-17762 
>> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>>...
>> >>>>
>> >>>>                      In SQL, any comparison with NULL is NULL, which
>> >>>>                      is interpreted as FALSE in a condition. To test
>> >>>>                      for NULLness, you use IS NULL or IS NOT NULL. But
>> >>>>                      LWT uses IF col = NULL as a NULLness test. This
>> >>>>                      is likely to confuse people coming from SQL and
>> >>>>                      hamper attempts to extend the dialect.
>> >>>>
>> >>>>
>> >>>>                  We can leave that Jira open to address what to do in
>> >>>>                  the legacy LWT case, but I'd support a SQL-congruent
>> >>>>                  syntax here (IS NULL or IS NOT NULL), where we have
>> >>>>                  something closer to a blank slate.
>> >>>>
>> >>>>                  Thoughts?
>> >>>>
>> >>>>                  On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
>> >>>>                  <[email protected] <mailto:[email protected]>> wrote:
>> >>>>
>> >>>>                      The new syntax looks great, and I’m really
>> >>>>                      excited to see this coming together.
>> >>>>
>> >>>>                      One piece of feedback on the proposed syntax is
>> >>>>                      around the use of “=“ as a declaration in
>> >>>>                      addition to its current use as an equality
>> >>>>                      operator in a WHERE clause and an assignment
>> >>>>                      operator in an UPDATE:
>> >>>>
>> >>>>                          BEGIN TRANSACTION
>> >>>>                            LET car_miles = miles_driven,
>> >>>>                          car_is_running = is_running FROM cars WHERE
>> >>>>                          model=’pinto’
>> >>>>                            LET user_miles = miles_driven FROM users
>> >>>>                          WHERE name=’blake’
>> >>>>                            SELECT something else from some other table
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                            UPDATE cars SET miles_driven = car_miles +
>> >>>>                          30 WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      This is supported in languages like PL/pgSQL, but
>> >>>>                      in a normal SQL query kind of local declaration
>> >>>>                      is often expressed as an alias (SELECT col AS
>> >>>>                      new_col), subquery alias (SELECT col) t, or
>> >>>>                      common table expression (WITH t AS (SELECT col)).
>> >>>>
>> >>>>                      Here’s an example of an alternative to the
>> >>>>                      proposed syntax that I’d find more readable:
>> >>>>
>> >>>>                          BEGIN TRANSACTION
>> >>>>                            WITH car_miles, car_is_running AS (SELECT
>> >>>>                          miles_driven, is_running FROM cars WHERE
>> >>>>                          model=’pinto’),
>> >>>>                          user_miles AS (SELECT miles_driven FROM users
>> >>>>                          WHERE name=’blake’)
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                          UPDATE cars SET miles_driven = car_miles + 30
>> >>>>                          WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      There’s also the option of naming the transaction
>> >>>>                      like a subquery, and supporting LET via AS (this
>> >>>>                      one I’m less sure about but wanted to propose
>> >>>>                      anyway):
>> >>>>
>> >>>>                          BEGIN TRANSACTION t1
>> >>>>                            SELECT miles_driven AS t1.car_miles,
>> >>>>                          is_running AS t1.car_is_running FROM cars
>> >>>>                          WHERE model=’pinto’;
>> >>>>                            SELECT miles_driven AS t1.user_miles FROM
>> >>>>                          users WHERE name=’blake’;
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                            UPDATE cars SET miles_driven = car_miles +
>> >>>>                          30 WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      This also has the benefit of resolving ambiguity
>> >>>>                      in case of naming conflicts with existing (or
>> >>>>                      future) column names.
>> >>>>
>> >>>>                      --
>> >>>>                      Abe
>> >>>>
>> 
>

Re: CEP-15 multi key transaction syntax

Reply via email to