Re: [DISCUSS] Improve Commitlog write path

2022-09-21 Thread Branimir Lambov
Hello Amit,

This paper may be of interest to you:
https://www.vldb.org/pvldb/vol15/p3359-lambov.pdf

We did a range of tests that are similar to your scenario and realized
several things early on:

   - Memory-mapping the commit log in combination with memory-mapped data
   or index files causes long msync delays. This can be solved by switching to
   a compressed log.
   - For smaller mutation sizes, using a large segment size
   practically removes the commit log bottleneck, even with compression and
   even though compression is currently single-threaded.
   - Write performance scaling with available CPU threads is limited by
   memtable congestion. Scaling can be improved by using a sharded memtable
   (introduced with CASSANDRA-17034).

Needing a compressed log to achieve the best write performance is not
ideal, and implementing a non-compressed non-memory-mapped option is
fairly easy, with or without Direct IO. If you are looking for a simple
performance improvement for the commit log, this is where I would start.

Regards,
Branimir

On Tue, Jul 26, 2022 at 3:36 PM Pawar, Amit  wrote:

> [Public]
>
>
>
> Hi Bowen,
>
>
>
> Thanks for your reply. Now it is clear that what are some benefits of this
> patch. I will send it for review once it is ready and hopefully it gets
> accepted.
>
>
>
> Thanks,
>
> Amit
>
>
>
> *From:* Bowen Song via dev 
> *Sent:* Tuesday, July 26, 2022 5:36 PM
> *To:* dev@cassandra.apache.org
> *Subject:* Re: [DISCUSS] Improve Commitlog write path
>
>
>
> [CAUTION: External Email]
>
> Hi Amit,
>
> That's some brilliant tests you have done there. It shows that the
> compaction throughput not only can be a bottleneck on the speed of insert
> operations, but it can also stress the JVM garbage collector. As a result
> of GC pressure, it can cause other things, such as insert, to fail.
>
> Your last statement is correct. The commit log change can be beneficial
> for atypical workloads where large volume of data is getting inserted and
> then expired soon, for example when using the TimeWindowCompactionStrategy
> with short TTL. But I must point out that this kind of atypical usage is
> often an anti-pattern in Cassandra, as Cassandra is a database, not a queue
> or cache system.
>
> This, however, is not saying the commit log change should not be
> introduced. As others have pointed out, it's down to a balancing act
> between the cost and benefit, and it will depend on the code complexity and
> the effect it has on typical workload, such as CPU and JVM heap usage.
> After all, we should prioritise the performance and reliability of typical
> usage before optimising for atypical use cases.
>
> Best,
> Bowen
>
> On 26/07/2022 12:41, Pawar, Amit wrote:
>
> [Public]
>
>
>
> Hi Bowen,
>
>
>
> Thanks for the reply and it helped to identify the failure point. Tested
> compaction throughput with different values and threads active in
> compaction reports “java.lang.OutOfMemoryError: Map failed” error with 1024
> MB/s earlier compared to other values. This shows with lower throughput
> such issues are going to come up not immediately but in days or weeks. Test
> results are given below.
>
>
>
>
> ++---+---+-+
>
> | Records| Compaction Throughput | 5 large files In GB   | Disk usage
> (GB) |
>
>
> ++---+---+-+
>
> | 20 | 8 | Not collected |
> 500 |
>
>
> ++---+---+-+
>
> | 20 | 16| Not collected |
> 500 |
>
>
> ++---+---+-+
>
> | 9  | 64| 3.5,3.5,3.5,3.5,3.5   |
> 273 |
>
>
> ++---+---+-+
>
> | 9  | 128   | 3.5, 3.9,4.9,8.0, 15  |
> 287 |
>
>
> ++---+---+-+
>
> | 9  | 256   | 11,11,12,16,20|
> 359 |
>
>
> ++---+---+-+
>
> | 9  | 512   | 14,19,23,27,28|
> 469 |
>
>
> ++---+---+-+
>
> | 9  | 1024  | 14,18,23,27,28|
> 458 |
>
>
> ++---+---+-+
>
> | 9  | 0 | 6.9,6.9,7.0,28,28 |
> 223 |
>
>
> ++---+---+-+
>
> ||   |
> | |
>
>
> ++---+---+-+
>
>
>
> Issues observed with incr

[DISCUSS] Adding dependency on agrona

2022-09-21 Thread Branimir Lambov
Hi everyone,

CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on
the agrona  library (https://github.com/real-logic/agrona).

Does anyone have any objections to adding this dependency?

Regards,
Branimir


Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread Aleksey Yeshchenko
Almost added it twice myself. High quality library with many nifty classes, +1 
from me.

> On 21 Sep 2022, at 13:28, Branimir Lambov  wrote:
> 
> Hi everyone,
> 
> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on the 
> agrona  library (https://github.com/real-logic/agrona 
> ).
> 
> Does anyone have any objections to adding this dependency?
> 
> Regards,
> Branimir



Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread Brandon Williams
+1

On Wed, Sep 21, 2022, 7:28 AM Branimir Lambov  wrote:

> Hi everyone,
>
> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on
> the agrona  library (https://github.com/real-logic/agrona).
>
> Does anyone have any objections to adding this dependency?
>
> Regards,
> Branimir
>


Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread Benedict
In principle no, it’s a high quality library. But it might help to briefly 
outline what it’s used for. I assume it is instead of ByteBuffer? In which case 
it could maybe be worthwhile discussing as a project how we foresee interaction 
with existing buffer machinery, and maybe how we expect our buffer use to 
evolve on the project, as we already have several buffers.

That said, I anticipate our buffer use changing significantly with the 
introduction of value types and native memory improvements coming in future 
Java releases, so my personal inclination is just to accept the dependency.

> On 21 Sep 2022, at 13:29, Branimir Lambov  wrote:
> 
> 
> Hi everyone,
> 
> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on the 
> agrona  library (https://github.com/real-logic/agrona).
> 
> Does anyone have any objections to adding this dependency?
> 
> Regards,
> Branimir


Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread David Capwell
I am +1 to adding, good library, but agree with Benedict it would be good to 
know “why”.

> On Sep 21, 2022, at 5:40 AM, Benedict  wrote:
> 
> In principle no, it’s a high quality library. But it might help to briefly 
> outline what it’s used for. I assume it is instead of ByteBuffer? In which 
> case it could maybe be worthwhile discussing as a project how we foresee 
> interaction with existing buffer machinery, and maybe how we expect our 
> buffer use to evolve on the project, as we already have several buffers.
> 
> That said, I anticipate our buffer use changing significantly with the 
> introduction of value types and native memory improvements coming in future 
> Java releases, so my personal inclination is just to accept the dependency.
> 
>> On 21 Sep 2022, at 13:29, Branimir Lambov  wrote:
>> 
>> 
>> Hi everyone,
>> 
>> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on 
>> the agrona  library (https://github.com/real-logic/agrona 
>> ).
>> 
>> Does anyone have any objections to adding this dependency?
>> 
>> Regards,
>> Branimir



Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread Josh McKenzie
+1 to David and Benedict's sentiments here. Probably a good habit for us to get 
into in general (including the "why" when hitting up ML about a new dep)

On Wed, Sep 21, 2022, at 1:29 PM, David Capwell wrote:
> I am +1 to adding, good library, but agree with Benedict it would be good to 
> know “why”.
> 
>> On Sep 21, 2022, at 5:40 AM, Benedict  wrote:
>> 
>> 
>> In principle no, it’s a high quality library. But it might help to briefly 
>> outline what it’s used for. I assume it is instead of ByteBuffer? In which 
>> case it could maybe be worthwhile discussing as a project how we foresee 
>> interaction with existing buffer machinery, and maybe how we expect our 
>> buffer use to evolve on the project, as we already have several buffers.
>> 
>> That said, I anticipate our buffer use changing significantly with the 
>> introduction of value types and native memory improvements coming in future 
>> Java releases, so my personal inclination is just to accept the dependency.
>> 
>> 
>>> On 21 Sep 2022, at 13:29, Branimir Lambov  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on 
>>> the agrona  library (https://github.com/real-logic/agrona).
>>> 
>>> Does anyone have any objections to adding this dependency?
>>> 
>>> Regards,
>>> Branimir


Re: [DISCUSS] Adding dependency on agrona

2022-09-21 Thread Derek Chen-Becker
Agrona looks like it has quite a bit more than just buffers, so if we add
this as a dependency for the new memtable, would it potentially open up use
of other parts of Agrona (wittingly or not)? Unless I misunderstood, wasn't
part of the new memtable implementation an interface to allow this to be
pluggable? Could we avoid bringing it in as a full dependency for Cassandra
if the trie memtable were packaged separately as a plugin instead of being
included directly?

Cheers,

Derek

On Wed, Sep 21, 2022 at 6:41 AM Benedict  wrote:

> In principle no, it’s a high quality library. But it might help to briefly
> outline what it’s used for. I assume it is instead of ByteBuffer? In which
> case it could maybe be worthwhile discussing as a project how we foresee
> interaction with existing buffer machinery, and maybe how we expect our
> buffer use to evolve on the project, as we already have several buffers.
>
> That said, I anticipate our buffer use changing significantly with the
> introduction of value types and native memory improvements coming in future
> Java releases, so my personal inclination is just to accept the dependency.
>
> On 21 Sep 2022, at 13:29, Branimir Lambov  wrote:
>
> 
> Hi everyone,
>
> CASSANDRA-17240 (Trie memtable implementation) introduces a dependency on
> the agrona  library (https://github.com/real-logic/agrona).
>
> Does anyone have any objections to adding this dependency?
>
> Regards,
> Branimir
>
>

-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: CEP-15 multi key transaction syntax

2022-09-21 Thread David Capwell
Caleb is making great progress on this, and I have been working on CQL fuzz 
testing the new grammar to make sure we flesh out cases quickly; one thing we 
hit was about mixing conditional and non-conditional updates; will use a 
example to better show

BEGIN TRANSACTION
  LET a = (SELECT * FROM ….);
  IF a IS NOT NULL THEN
UPDATE …;
  END IF
  INSERT INTO ...
COMMIT TRANSACTION

In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
simple and have all updates tied to conditional when present?

> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
> wrote:
> 
> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
> (or better, the standard) has a way to select data into a variable, let's 
> adopt it.
> 
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my 
> preference.
> 
> On 8/22/22 19:13, Patrick McFadin wrote:
>> The replies got trashed pretty badly in the responses. 
>> When you say: "Agree it's better to reuse existing syntax than invent new 
>> syntax."
>> 
>> Which syntax are you referring to?
>> 
>> Patrick
>> 
>> 
>> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev > > wrote:
>> Agree it's better to reuse existing syntax than invent new syntax.
>> 
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev > > > [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say, you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we do
>> >>> need a way for the user to say they care whether the row was null or the
>> >>> column. I think an implicit conversion here could be surprising. However
>> >>> we could implement tuple expressions anyway and let the user explicitly
>> >>> declare v as a tuple as Caleb has suggested for

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Patrick McFadin
I'm also working on different use cases and syntax for Accord :D

I'm +1 on this change and leaving the door open for maybe a few more as we
test this out. It needs to be functionally useful for developers in v1, and
I think it's worth the changes to get it right.

One other thing Caleb and I have been discussing is how, when running a
transaction, the statement returns with no message. In CQLSH you have no
idea if anything happened unless you select from the tables and look for
changes. Even something like LWT adds with "applied=true|false"

Patrick

On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
> UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> dev@cassandra.apache.org> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev  [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be
>> directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding
>> SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for
>> the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, bot

Re: [VOTE] CEP-20: Dynamic Data Masking

2022-09-21 Thread Miklosovic, Stefan
+1

Od: Andrés de la Peña 
Odoslané: pondelok 19. septembra 2022 13:39
Komu: dev@cassandra.apache.org 
Predmet: [VOTE] CEP-20: Dynamic Data Masking

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi everyone,

I'd like to propose CEP-20 for approval.

Proposal: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking
Discussion: https://lists.apache.org/thread/qsmxsymozymy6dy9tp5xw9gn5fhz9nt4

The vote will be open for 72 hours.
Votes by committers are considered binding.
A vote passes if there are at least three binding +1s and no binding vetoes.

Thank you,


Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Benedict
My inclination is not to support this until we support arbitrary numbers of IF 
statements. It’s one too many arbitrary restrictions and it potentially gets 
confusing.

But I don’t feel super strongly about it.

> On 21 Sep 2022, at 20:56, Patrick McFadin  wrote:
> 
> 
> I'm also working on different use cases and syntax for Accord :D
> 
> I'm +1 on this change and leaving the door open for maybe a few more as we 
> test this out. It needs to be functionally useful for developers in v1, and I 
> think it's worth the changes to get it right. 
> 
> One other thing Caleb and I have been discussing is how, when running a 
> transaction, the statement returns with no message. In CQLSH you have no idea 
> if anything happened unless you select from the tables and look for changes. 
> Even something like LWT adds with "applied=true|false" 
> 
> Patrick
> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
>>> wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
  wrote:
> Agree it's better to reuse existing syntax than invent new syntax.
> 
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say, 
> >>> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they 
> >> become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to 
> >> select a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be 
> >>> directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding 
> >>> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = 

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Jeff Jirsa
I expect that a lot of use cases will update M and insert into N tables
based on one condition, so if that's a problem with the grammar today, I
think it'd probably be worth the time to sort that out?



On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
> UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev 
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> dev@cassandra.apache.org> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev  [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT  INTO  FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> --+--+--
>> >>   f| t|
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be
>> directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding
>> SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for
>> the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you
>> prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we
>> do
>> >>> need a way for the user to say they care 

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread Benedict
Not quite sure I follow, but the syntax we agreed permits you to update as many 
tables as you like with a single condition, or with no condition, but not to 
mix both conditional and unconditional updates in a single transaction.

My preference is to keep this simple until we permit arbitrarily complex logic, 
ie sequences of (potentially nested) ifs and unconditional updates.

> On 21 Sep 2022, at 21:04, Jeff Jirsa  wrote:
> 
> 
> I expect that a lot of use cases will update M and insert into N tables based 
> on one condition, so if that's a problem with the grammar today, I think it'd 
> probably be worth the time to sort that out? 
> 
> 
> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell  wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev  
>>> wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
  wrote:
> Agree it's better to reuse existing syntax than invent new syntax.
> 
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev  [22/08/14 15:59]:
> >
> > MySQL supports SELECT  INTO  FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say, 
> >>> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> --+--+--
> >>   f| t|
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they 
> >> become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to 
> >> select a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>  LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be 
> >>> directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding 
> >>> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = * is simply syntactic sugar
> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
> >>> and 5 all at once, consistently alongside each other.
> >>>
> >>> Option 6 is in 

Re: CEP-15 multi key transaction syntax

2022-09-21 Thread David Capwell
> I expect that a lot of use cases will update M and insert into N tables based 
> on one condition


Jeff, the issue is a scope issue

— works fine today
IF …
  UPDATE ….;
  INSERT …;
END IF

— also works today just fine; no condition is used with the mutations
UPDATE ….;
INSERT …;

— does not work today
IF
  UPDATE ….;
  INSERT …;
END IF
— this breaks the parser as it does not belong to the above IF block
INSERT …;

So its not that updating multiple tables is a problem, its just that mapping 
mutations to conditions is purely on if a condition exists today and the parser 
assumes this as well… so all mutations are tied to a condition if present, else 
all mutations have no conditions…. The parser helps enforce this by failing if 
you mix.

> My inclination is not to support this until we support arbitrary numbers of 
> IF statements. 

That is my feeling as well.  I am cool with v1 having this limitation as it 
does NOT block future versions to enhance the syntax, and when we can support 
multiple IF then we need to decouple this current implementation detail… so 
easier to deal with then.

> On Sep 21, 2022, at 1:22 PM, Benedict  wrote:
> 
> Not quite sure I follow, but the syntax we agreed permits you to update as 
> many tables as you like with a single condition, or with no condition, but 
> not to mix both conditional and unconditional updates in a single transaction.
> 
> My preference is to keep this simple until we permit arbitrarily complex 
> logic, ie sequences of (potentially nested) ifs and unconditional updates.
> 
>> On 21 Sep 2022, at 21:04, Jeff Jirsa  wrote:
>> 
>> 
>> I expect that a lot of use cases will update M and insert into N tables 
>> based on one condition, so if that's a problem with the grammar today, I 
>> think it'd probably be worth the time to sort that out? 
>> 
>> 
>> 
>> On Wed, Sep 21, 2022 at 12:42 PM David Capwell > > wrote:
>> Caleb is making great progress on this, and I have been working on CQL fuzz 
>> testing the new grammar to make sure we flesh out cases quickly; one thing 
>> we hit was about mixing conditional and non-conditional updates; will use a 
>> example to better show
>> 
>> BEGIN TRANSACTION
>>   LET a = (SELECT * FROM ….);
>>   IF a IS NOT NULL THEN
>> UPDATE …;
>>   END IF
>>   INSERT INTO ...
>> COMMIT TRANSACTION
>> 
>> In this case we have 1 UPDATE tied to the IF condition, and one INSERT that 
>> isn’t… for v1 do we need/want to support this, or is it best for v1 to be 
>> simple and have all updates tied to conditional when present?
>> 
>>> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev >> > wrote:
>>> 
>>> I wasn't referring to specific syntax but to the concept. If a SQL dialect 
>>> (or better, the standard) has a way to select data into a variable, let's 
>>> adopt it.
>>> 
>>> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is 
>>> my preference.
>>> 
>>> On 8/22/22 19:13, Patrick McFadin wrote:
 The replies got trashed pretty badly in the responses. 
 When you say: "Agree it's better to reuse existing syntax than invent new 
 syntax."
 
 Which syntax are you referring to?
 
 Patrick
 
 
 On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev 
 mailto:dev@cassandra.apache.org>> wrote:
 Agree it's better to reuse existing syntax than invent new syntax.
 
 On 8/21/22 16:52, Konstantin Osipov wrote:
 > * Avi Kivity via dev >>> > > [22/08/14 15:59]:
 >
 > MySQL supports SELECT  INTO  FROM ... WHERE
 > ...
 >
 > PostgreSQL supports pretty much the same syntax.
 >
 > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
 > MySQL/PostgreSQL SELECT ... INTO?
 >
 >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
 >>> 
 >>> I’ll do my best to express with my thinking, as well as how I would
 >>> explain the feature to a user.
 >>>
 >>> My mental model for LET statements is that they are simply SELECT
 >>> statements where the columns that are selected become variables
 >>> accessible anywhere in the scope of the transaction. That is to say, 
 >>> you
 >>> should be able to run something like s/LET/SELECT and
 >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
 >>> and produce a valid SELECT statement, and vice versa. Both should
 >>> perform identically.
 >>>
 >>> e.g.
 >>> SELECT pk AS key, v AS value FROM table
 >>>
 >>> =>
 >>> LET key = pk, value = v FROM table
 >>
 >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
 >> supports selecting comparisons:
 >>
 >>
 >> $ psql
 >> psql (14.3)
 >> Type "help" for help.
 >>
 >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
 >>   ?column? | ?column? | ?column?
 >> --+--+--
 >>   f