Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-24 Thread Yifan Cai
Thanks for working on this!

Another bikeshed I noticed is the project logo.

Currently, all of them share the same one as Sidecar. The subprojects can
be styled up.  :p

- Yifan

On Wed, Oct 23, 2024 at 9:46 AM Brandon Williams  wrote:

> https://issues.apache.org/jira/projects/CASSPYTHON/
> https://issues.apache.org/jira/projects/CASSJAVA/
> https://issues.apache.org/jira/projects/CASSGO/
>
> are now live.  If you know of any issues to move there, please do so.
>
> Kind Regards,
> Brandon
>
> On Wed, Oct 23, 2024 at 6:59 AM Brandon Williams  wrote:
> >
> > If nobody objects I will be creating the CASS jira projects later
> today.
> >
> > Kind Regards,
> > Brandon
> >
> > On Tue, Oct 22, 2024 at 10:35 AM Ekaterina Dimitrova
> >  wrote:
> > >
> > > Honestly, counting the letters was also a thing that happened to me
> but I should admit that even with CASSANDRAANALYTICS we count the As…
> > >
> > > My preference is CASSX
> > >
> > > Seems shorter and less painful to read to me as a user.
> > >
> > > Thanks
> > >
> > > On Tue, 22 Oct 2024 at 11:18, Patrick McFadin 
> wrote:
> > >>
> > >> CASS + NAME is my +1
> > >>
> > >> TBH rarely with this be typed. Just copied and pasted. It has to be
> > >> clear that naming is different from the other projects and I think we
> > >> get it either way.
> > >>
> > >> On Tue, Oct 22, 2024 at 8:15 AM Štefan Miklošovič
> > >>  wrote:
> > >> >
> > >> > Something like this?
> > >> >
> > >> > CASSANDRA
> > >> > CASSPYTHON
> > >> > CASSGO
> > >> > CASSJAVA
> > >> > CASSSIDECAR
> > >> > CASSANALYTICS
> > >> >
> > >> > if we expand it would be like
> > >> >
> > >> > CASSANDRA
> > >> > CASSANDRAPYTHON
> > >> > CASSANDRAGO
> > >> > CASSANDRAJAVA
> > >> > CASSANDRASIDECAR
> > >> > CASSANDRAANALYTICS
> > >> >
> > >> > I don't know ... the first form seems fine to me but that triple S
> in CASSSIDECAR is strange. I just find myself counting S's when I type it.
> > >> >
> > >> > Up to you guys. I don't mind both.
> > >> >
> > >> > On Tue, Oct 22, 2024 at 5:01 PM Brandon Williams 
> wrote:
> > >> >>
> > >> >> I don't think underscore is an option from selfserve anyway.  If we
> > >> >> have to stick everything together then I think having fewer things
> is
> > >> >> better, so we could drop the 'driver' and just name things like
> > >> >> CASSPYTHON.  WDYT?
> > >> >>
> > >> >> Kind Regards,
> > >> >> Brandon
> > >> >>
> > >> >> On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
> > >> >>  wrote:
> > >> >> >
> > >> >> > So we will have stuff like
> > >> >> >
> > >> >> > CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in
> the commit messages will be like
> > >> >> >
> > >> >> > CASS_DRIVER_PYTHON-1234
> > >> >> >
> > >> >> > I checked (1) and there is not a single one which has
> underscores in its name, now THAT would be a precedent, wouldn't it ...
> > >> >> >
> > >> >> > (1)
> https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> > >> >> >
> > >> >> > On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha <
> martin.su...@kiwi.com> wrote:
> > >> >> >>
> > >> >> >> This seems to be relevant documentation:
> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
> > >> >> >>
> > >> >> >> Martin
> > >> >> >>
> > >> >> >> 
> > >> >> >> This email, including attached files, may contain confidential
> information and is intended only for the use of the individual and/or
> entity to which it is addressed. If you are not the intended recipient,
> disclosure, copying, use, or distribution of the information included in
> this email and/or in its attachments is prohibited.
> > >> >> >> If you have received it by mistake, please do not read, copy or
> use it, or disclose its contents to others. Please notify the sender that
> you have received this email by mistake by replying to the email, and then
> delete the email and any copies and attachments of it. Thank you.
>


Re: [VOTE] CEP-42: Constraints Framework

2024-10-24 Thread Yifan Cai
Hello, everyone.

I’ve been reviewing the patch for the constraints framework
, and I believe there are
several aspects outlined in CEP-42 that warrant reconsideration. I’d like
to bring these points up for discussion.
*1. New Reserved Keyword*

The patch introduces a new reserved keyword, "CONSTRAINT." Since reserved
keywords cannot be used as identifiers unless quoted, this can complicate
data definition declarations. We should aim to avoid adding new reserved
keywords where possible. Here are a couple of alternatives:

1.1 *Inline Constraint Definition*

We could eliminate the keyword "CONSTRAINT." Instead, similar to data
masking, constraints could be defined using "CONSTRAINED WITH." For
example, in the following code, r_value_range_lower_bound and
r_value_range_upper_bound are constraint names, followed immediately by
their expressions, with multiple constraints connected using "AND".

CREATE TABLE rgb (
  name text PRIMARY KEY,
  r int CONSTRAINED WITH r_value_range_lower_bound CHECK r >= 0 AND
r_value_range_upper_bound CHECK r < 256,
  ...
);

1.2 *Special Symbol*

Another option is to use a special symbol to differentiate from
identifiers, such as "@CONSTRAINT." However, since there is currently no
annotation-like concept in CQL, this might confuse users.

CREATE TABLE rgb (
  name text PRIMARY KEY,
  r int,
  ...
  @CONSTRAINT r_value_range_lower_bound CHECK r >= 0,
  @CONSTRAINT r_value_range_upper_bound CHECK r < 256,
  ...
);

*2. Constraint Name*

CEP-42 states, "Name of the constraint is optional. If it is not provided,
a name is generated for the constraint."

However, based on the actual statements defining constraints, I believe
names should be *mandatory* for clarity and usability. System-generated
names often lack descriptiveness.
*3. Cross-Column Constraints*

CEP-42 proposes allowing constraints that compare multiple columns. For
example,

CREATE TABLE keyspace.table (
  p1 int,
  p2 int,
  ...,
  CONSTRAINT [name] CHECK (p1 != p2)
);

Such constraints can be problematic due to their referential nature.
Consider scenarios where column p2 is dropped, or when insert/update
operations include only partial values (e.g., only inserting p1). Should
the query result in a read (before write), or should it fail due to
incomplete values?

For simplicity, I propose that, at least for the initial iteration, we
exclude support for cross-column constraints. In other words, constraints
should only check the values of individual columns.

- Yifan

On Thu, Sep 19, 2024 at 11:46 AM Patrick McFadin  wrote:

> Thanks for the update. My inbox search failed me :D
>
> On Thu, Sep 19, 2024 at 11:31 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi Patrick,
>>
>> Thanks for taking a look at this and keeping the house tidy.
>>
>> I announced the voting results on a sepparate thread:
>> lists.apache.org
>> 
>> [image: favicon.ico]
>> 
>> 
>>
>> As a follow up, this is not stalled, and I’m currently working on a patch
>> that will be soon available for review.
>>
>> Thanks,
>> Bernardo
>>
>>
>> On Sep 19, 2024, at 11:20 AM, Patrick McFadin  wrote:
>>
>> I'm going to cap this thread. Vote passes with no binding -1s.
>>
>> On Tue, Jul 2, 2024 at 2:25 PM Jordan West  wrote:
>>
>>> +1
>>>
>>> On Tue, Jul 2, 2024 at 12:15 Francisco Guerrero 
>>> wrote:
>>>
 +1

 On 2024/07/02 18:45:33 Josh McKenzie wrote:
 > +1
 >
 > On Tue, Jul 2, 2024, at 1:18 PM, Abe Ratnofsky wrote:
 > > +1 (nb)
 > >
 > >> On Jul 2, 2024, at 12:15 PM, Yifan Cai  wrote:
 > >>
 > >> +1 on CEP-42.
 > >>
 > >> - Yifan
 > >>
 > >> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad 
 wrote:
 > >>> +1
 > >>>
 > >>> On Tue, Jul 2, 2024 at 5:06 AM  wrote:
 >  +1
 > 
 > 
 > > On Jul 1, 2024, at 8:34 PM, Doug Rohrer 
 wrote:
 > >
 > > +1 (nb) - Thanks for all of the suggestions and Bernardo for
 wrangling the CEP into shape!
 > >
 > > Doug
 > >
 > >> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi 
 wrote:
 > >>
 > >> +1
 > >>
 > >> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg <
 ar...@weisberg.ws> wrote:
 > >>> __
 > >>> Hi,
 > >>>
 > >>> I am +1 on CEP-42 with the latest updates to the CEP to
 clarify syntax, error messages, constraint naming and generated naming,
 alter/drop, describe etc.
 > >>>
 > >>> I think this now tracks very closely to how other SQL
 databases define constraints and the syntax is easily extensible to
 multi-column and multi-table constraints.
 > >>>
 > >>> Ariel
 > >>>
 > >>> On Mon, Jul 1, 

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-24 Thread guo Maxwell
yes,you are right. I will add this

Štefan Miklošovič 于2024年10月24日 周四下午4:42写道:

> The CEP should also mention that copying system tables or virtual tables
> or materialized views and similar are not supported and an attempt of doing
> so will error out.
>
> On Thu, Oct 24, 2024 at 7:16 AM Dave Herrington 
> wrote:
>
>> Strong +1 to copy all options by default. This is intuitive to me.  Then
>> I would like to explicitly override any options of my choosing.
>>
>> -Dave
>>
>> On Wed, Oct 23, 2024 at 9:57 PM guo Maxwell  wrote:
>>
>>> OK,thank you for your suggestions ,I will revise the CEP and copy table
>>> OPTIONS by default.
>>>
>>> Jon Haddad 于2024年10月23日 周三下午9:18写道:
>>>
 Also strongly +1 to copying all the options.


 On Wed, Oct 23, 2024 at 5:52 AM Josh McKenzie 
 wrote:

> I'm a very strong +1 to having the default functionality be to copy
> *ALL* options.
>
> Intuitively, as a user, if I tell a software system to make a clone of
> something I don't expect it to be shallow or a subset defined by some
> external developer somewhere. I expect it to be a clone.
>
> Adding in some kind of "lean" mode or "column only" is fine if someone
> can make a cogent argument around its inclusion. I don't personally see a
> use-case for it right now but definitely open to being educated.
>
> On Wed, Oct 23, 2024, at 3:03 AM, Štefan Miklošovič wrote:
>
> options are inherently part of that table as well, same as schema. In
> fact, _schema_ includes all options. Not just columns and its names. If 
> you
> change some option, you effectively have a different schema, schema 
> version
> changes by changing an option. So if we do not copy options too, we are
> kind of faking it (when we do not specify WITH OPTIONS).
>
> Also, imagine a situation where Accord is merged to trunk. It
> introduces a new schema option called "transactional = full" which is not
> default. (I am sorry if I did the spelling wrong here). So, when you have 
> a
> table with transactional support and you do "create table ks.tb_copy like
> ks.tb", when you _do not_ copy all options, this table will _not_ become
> transactional.
>
> The next thing you go to do is to execute some transactions against
> this table but well ... you can not do that, because your table is not
> transactional, because you have forgotten to add "WITH OPTIONS". So you
> need to go back to that and do "ALTER ks.tb_copy WITH transactional = 
> full"
> just to support that.
>
> I think that you see from this pattern that it is way better if we
> copy all options by default instead of consciously opt-in into them.
>
> also:
>
> "but I think there are also some users want to do basic column
> information copy"
>
> where is this coming from? Do you have this idea somehow empirically
> tested? I just do not see why somebody would want to have Cassandra's
> defaults instead of what a base table contains.
>
> On Wed, Oct 23, 2024 at 8:28 AM guo Maxwell 
> wrote:
>
> The reason for using OPTION keyword is that I want to provide users
> with more choices .
> The default behavior for copying a table is to copy the basic item of
> table (column and their data type,mask,constraint),others thing belongs to
> the table like option,views,trigger
> are optional in my mind.
> You are absolutely right that users may want to copy all stuff but I
> think there are aslo some users want to do basic column information 
> copy,So
> I just give them a choice。As we know that the number of table parameters 
> is
> not small,compression,compaction,gc_seconds,bf_chance,speculative_retry 
> and
> so on.
>
> Besides we can see that pg have also the keyword COMMENT,COMPRESSION
> which have the similar behavior as our OPTION keyword。
>
> So that is why I add this keyword OPTION.
>
>
> Štefan Miklošovič 于2024年10月22日 周二下午11:40写道:
>
> The problem is that when I do this minimal CQL which shows this
> feature:
>
> CREATE TABLE ks.tb_copy LIKE ks.tb;
>
> then you are saying that when I _do not_ specify WITH OPTIONS then I
> get Cassandra's defaults. Only after I specify WITH OPTIONS, it would
> truly be a copy.
>
> This is not a good design. Because to have an exact copy, I have to
> make a conscious effort to include OPTIONS as well. That should not be the
> case. I just want to have a copy, totally the same stuff, when I use the
> minimal version of that statement. It would be better to opt-out from
> options like
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITHOUT OPTIONS (you feel me) but
> we do not support this (yet).
>
> On Tue, Oct 22, 2024 at 5:28 PM Štefan Miklošovič <
> smikloso...@apache.org> wrote:
>
> I jus

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-24 Thread Štefan Miklošovič
The CEP should also mention that copying system tables or virtual tables or
materialized views and similar are not supported and an attempt of doing so
will error out.

On Thu, Oct 24, 2024 at 7:16 AM Dave Herrington 
wrote:

> Strong +1 to copy all options by default. This is intuitive to me.  Then I
> would like to explicitly override any options of my choosing.
>
> -Dave
>
> On Wed, Oct 23, 2024 at 9:57 PM guo Maxwell  wrote:
>
>> OK,thank you for your suggestions ,I will revise the CEP and copy table
>> OPTIONS by default.
>>
>> Jon Haddad 于2024年10月23日 周三下午9:18写道:
>>
>>> Also strongly +1 to copying all the options.
>>>
>>>
>>> On Wed, Oct 23, 2024 at 5:52 AM Josh McKenzie 
>>> wrote:
>>>
 I'm a very strong +1 to having the default functionality be to copy
 *ALL* options.

 Intuitively, as a user, if I tell a software system to make a clone of
 something I don't expect it to be shallow or a subset defined by some
 external developer somewhere. I expect it to be a clone.

 Adding in some kind of "lean" mode or "column only" is fine if someone
 can make a cogent argument around its inclusion. I don't personally see a
 use-case for it right now but definitely open to being educated.

 On Wed, Oct 23, 2024, at 3:03 AM, Štefan Miklošovič wrote:

 options are inherently part of that table as well, same as schema. In
 fact, _schema_ includes all options. Not just columns and its names. If you
 change some option, you effectively have a different schema, schema version
 changes by changing an option. So if we do not copy options too, we are
 kind of faking it (when we do not specify WITH OPTIONS).

 Also, imagine a situation where Accord is merged to trunk. It
 introduces a new schema option called "transactional = full" which is not
 default. (I am sorry if I did the spelling wrong here). So, when you have a
 table with transactional support and you do "create table ks.tb_copy like
 ks.tb", when you _do not_ copy all options, this table will _not_ become
 transactional.

 The next thing you go to do is to execute some transactions against
 this table but well ... you can not do that, because your table is not
 transactional, because you have forgotten to add "WITH OPTIONS". So you
 need to go back to that and do "ALTER ks.tb_copy WITH transactional = full"
 just to support that.

 I think that you see from this pattern that it is way better if we copy
 all options by default instead of consciously opt-in into them.

 also:

 "but I think there are also some users want to do basic column
 information copy"

 where is this coming from? Do you have this idea somehow empirically
 tested? I just do not see why somebody would want to have Cassandra's
 defaults instead of what a base table contains.

 On Wed, Oct 23, 2024 at 8:28 AM guo Maxwell 
 wrote:

 The reason for using OPTION keyword is that I want to provide users
 with more choices .
 The default behavior for copying a table is to copy the basic item of
 table (column and their data type,mask,constraint),others thing belongs to
 the table like option,views,trigger
 are optional in my mind.
 You are absolutely right that users may want to copy all stuff but I
 think there are aslo some users want to do basic column information copy,So
 I just give them a choice。As we know that the number of table parameters is
 not small,compression,compaction,gc_seconds,bf_chance,speculative_retry and
 so on.

 Besides we can see that pg have also the keyword COMMENT,COMPRESSION
 which have the similar behavior as our OPTION keyword。

 So that is why I add this keyword OPTION.


 Štefan Miklošovič 于2024年10月22日 周二下午11:40写道:

 The problem is that when I do this minimal CQL which shows this feature:

 CREATE TABLE ks.tb_copy LIKE ks.tb;

 then you are saying that when I _do not_ specify WITH OPTIONS then I
 get Cassandra's defaults. Only after I specify WITH OPTIONS, it would
 truly be a copy.

 This is not a good design. Because to have an exact copy, I have to
 make a conscious effort to include OPTIONS as well. That should not be the
 case. I just want to have a copy, totally the same stuff, when I use the
 minimal version of that statement. It would be better to opt-out from
 options like

 CREATE TABLE ks.tb_copy LIKE ks.tb WITHOUT OPTIONS (you feel me) but we
 do not support this (yet).

 On Tue, Oct 22, 2024 at 5:28 PM Štefan Miklošovič <
 smikloso...@apache.org> wrote:

 I just don't see OPTIONS as important. When I want to copy a table, I
 am copying a table _with everything_. Options included, by default. Why
 would I want to have a copy of a table with options different from the base
 one?


 On Mo

Re: CEP-32: Open-Telemetry integration

2024-10-24 Thread Yuki Morishita
Hi Maxim, thanks for taking a look at the CEP and giving the feedback.

CEP-32 is built on the current Cassandra tracing/metrics/logs
implementation,
and I chose OpenTelemetry and its protocol OTLP for exporting them.
OpenTelemetry is supported by many APM vendors, so users can just send
telemetry from Cassandra easily.

Maybe the custom API / CQL are better, but to integrate them with APM
systems
you need another component to do so.

I got your point on granularity and performance concerns, it should be
taken care of,
but CEP-32 focuses on exporting based on the current implementation.

> 1. The first is how do we manage all these integrations, because
> according to the CEP we are adding new dependencies and interfaces [1]
> to the project and adding new configuration values, this is not bad in
> itself. However, it also means that as the number of integrations
> increases, so does the maintenance of the project and config (the
> vision - is to have minimal extra deps in the core and the smallest
> config).

I agree that the extra dependencies should be minimal, that's why my
proposal
is to only support OTLP exporter. With this, the user can just can use any
APM that
supports direct ingestion of OTLP, or users can bring OpenTelemetry
Collector to
do necessary processing there and export to whatever they want.
Alternatively, users can download alternative exporters (i.e. jaeger) and
use that.

New configuration introduced in CEP is just one for enabling OpenTelemetry.
Other configs are available through OpenTelemetry SDK Autoconfigure [1].

[1]
https://opentelemetry.io/docs/languages/java/configuration/#zero-code-sdk-autoconfigure

On Thu, Oct 24, 2024 at 6:47 AM Maxim Muzafarov  wrote:

> Hello,
>
>
> I wanted to throw some ideas and a vision in terms of metrics,
> trancing and the adoption of new integrations, particularly
> OpenTelemetry. I personally feel that the more integrations we have,
> the better the adoption of Cassandra as a database will be. With
> OpenTelemetry, users could have a better "first experience", so I'm +1
> here.
>
> I have two concerns with the way we currently handle such integrations:
>
> 1. The first is how do we manage all these integrations, because
> according to the CEP we are adding new dependencies and interfaces [1]
> to the project and adding new configuration values, this is not bad in
> itself. However, it also means that as the number of integrations
> increases, so does the maintenance of the project and config (the
> vision - is to have minimal extra deps in the core and the smallest
> config).
>
> 2. Exporting metrics/logs should not affect the node itself (adjusting
> the JVM params [2] of the node to make the integration work tells us
> that we are doing something wrong) and the JVM process that does the
> main work with the data by handling user requests. The priority of
> serving metrics/logs is lower than a user request. The current
> approach of adding new metric exporters and/or instrumenting JVM
> agents could affect the stability and performance of the node, the
> bugs could prevent the node from serving user requests as well (e.g.
> calculating instead of exporting raw histograms [3] causing gcs and
> impacts the node).
>
>
>
> With all that, the alternate solution and the vision I'm trying to
> highlight here is that we should just rely on the native protocol and
> "incorporate" these things into the native protocol itself and CQL as
> its part.
> That way, Cassandra Sidecar and other sub-projects interested in the
> internal state of the node can rely only on the protocol specification
> and the query syntax.
>
> Specifically, querying the node's internal state (basically metrics
> and logs) is being done using two paradigms: "poll" and "push".
>
> 1. The "poll" is the simplest part, we already have all we need - lots
> of virtual tables. A new virtual keyspace "system_metrics" [4]
> contains all the internal metrics in the Prometheus format that
> Cassandra exposes in JMX, and can be queried by any other system (e.g.
> the Cassandra Sidecar that has established a local connection via Unix
> Domain Socket to query the metrics) to expose them via the REST API or
> other interfaces they need. The efficiency of exposing these metrics
> is the best we can offer in terms of performance (I hope).
>
> 2. The "push" is currently and unfortunately is not implemented - but
> normally is used and designed to export logs and internal events. The
> vision is - to register a continuous query to listen for the log
> updates on the node, which is also a part of the Sidecar. Such a
> feature would be useful in itself, regardless of the fact that in our
> case we are going to use it to listen to internal events and log
> updates. From my point of view, other database vendors offer something
> similar that Cassandra lacks:
>
> https://cloud.google.com/bigquery/docs/continuous-queries-introduction
> https://docs.influxdata.com/influxdb/v1/query_language/continuous_quer