Re: [Discuss] Repair inside C*

2024-10-22 Thread Benedict
I realise it’s out of scope, but to counterbalance all of the pro-decomposition messages I wanted to chime in with a strong -1. But we can debate that in a suitable context later.On 22 Oct 2024, at 16:36, Jordan West  wrote:Agreed with the sentiment that decomposition is a good target but out of scope here. I’m personally excited to see an in-tree repair scheduler and am supportive of the approach shared here. Jordan On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:Decomposing Cassandra may be architecturally desirable but that is not the goal of this CEP. This CEP brings value to operators today so it should be considered on that merit. We definitely need to have a separate conversation on Cassandra's architectural direction.On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch  wrote:Definitely like this in C* itself. We only changed our proposal to putting repair scheduling in the sidecar before because trunk was frozen for the foreseeable future at that time. With trunk unfrozen and development on the main process going at a fast pace I think it makes way more sense to integrate natively as table properties as this CEP proposes. Completely agree the scheduling overhead should be minimal.Moving the actual repair operation (comparing data and streaming mismatches) along with compaction operations to a separate process long term makes a lot of sense but imo only once we both have a release of sidecar and a contract figured out between them on communication. I'm watching CEP-38 there as I think CQL and virtual tables are looking much stronger than when we wrote CEP-1 and chose HTTP but that's for that discussion and not this one.-JoeyOn Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero  wrote:Like others have said, I was expecting the scheduling portion of repair is
negligible. I was mostly curious if you had something handy that you can
quickly share.

On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> >Jaydeep, do you have any metrics on your clusters comparing them before
> and after introducing repair scheduling into the Cassandra process?
> 
> Yes, I had made some comparisons when I started rolling this feature out to
> our production five years ago :)  Here are the details:
> *The Scheduling*
> The scheduling itself is exceptionally lightweight, as only one additional
> thread monitors the repair activity, updating the status to a system table
> once every few minutes or so. So, it does not appear anywhere in the CPU
> charts, etc. Unfortunately, I do not have those graphs now, but I can do a
> quick comparison if it helps!
> 
> *The Repair Itself*
> As we all know, the Cassandra repair algorithm is a heavy-weight process
> due to Merkle tree/streaming, etc., no matter how we schedule it. But it is
> an orthogonal topic and folks are already discussing creating a new CEP.
> 
> Jaydeep
> 
> 
> On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
> wrote:
> 
> > Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > Sounds good. Just wanted to bring it up. I agree that the scheduling bit
> > is
> > > pretty light weight and the ideal would be to bring the whole of the
> > repair
> > > external, which is a much bigger can of worms to open.
> > >
> > >
> > >
> > > -Jeremiah
> > >
> > >
> > >
> > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> > wrote:
> > > >
> > > >
> > >
> > > > 
> > > >
> > > > > I actually think we should be looking at how we can move things out
> > of the
> > > > database process.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > While worth pursuing, I think we would need a different CEP just to
> > figure
> > > > out how to do that. Not only is there a lot of infrastructure
> > difficulty in
> > > > running multi process, the inter app communication needs to be figured
> > out
> > > > better then JMX. Even the sidecar we dont have a solid story on how to
> > > > ensure both are running or anything yet. It's up to each app owner to
> > figure
> > > > it out. Once we have a good thing in place I think we can start moving
> > > > compactions, repairs, etc out of the database. Even then it's the
> > _repairs_
> > > > that is expensive, not the scheduling.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)>
> > wrote:
> > > >
> > > >
> > >
> > > >> I love the idea of a repair service being there by default for an
> > install
> > > of C*.  My main concern here is that it is putting more services into
> > the main
> > > database process.  I actually think we should be looking at how we can
> > move
> > > things out of the database process.  The C* process being a giant
> > monolith has
> > > always been a pain point.  Is there anyway it makes 

Re: [Discuss] Repair inside C*

2024-10-22 Thread Jordan West
Agreed with the sentiment that decomposition is a good target but out of
scope here. I’m personally excited to see an in-tree repair scheduler and
am supportive of the approach shared here.

Jordan

On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:

> Decomposing Cassandra may be architecturally desirable but that is not the
> goal of this CEP. This CEP brings value to operators today so it should be
> considered on that merit. We definitely need to have a separate
> conversation on Cassandra's architectural direction.
>
> On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch 
> wrote:
>
>> Definitely like this in C* itself. We only changed our proposal to
>> putting repair scheduling in the sidecar before because trunk was frozen
>> for the foreseeable future at that time. With trunk unfrozen and
>> development on the main process going at a fast pace I think it makes way
>> more sense to integrate natively as table properties as this CEP proposes.
>> Completely agree the scheduling overhead should be minimal.
>>
>> Moving the actual repair operation (comparing data and streaming
>> mismatches) along with compaction operations to a separate process long
>> term makes a lot of sense but imo only once we both have a release of
>> sidecar and a contract figured out between them on communication. I'm
>> watching CEP-38 there as I think CQL and virtual tables are looking much
>> stronger than when we wrote CEP-1 and chose HTTP but that's for that
>> discussion and not this one.
>>
>> -Joey
>>
>> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
>> wrote:
>>
>>> Like others have said, I was expecting the scheduling portion of repair
>>> is
>>> negligible. I was mostly curious if you had something handy that you can
>>> quickly share.
>>>
>>> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
>>> > >Jaydeep, do you have any metrics on your clusters comparing them
>>> before
>>> > and after introducing repair scheduling into the Cassandra process?
>>> >
>>> > Yes, I had made some comparisons when I started rolling this feature
>>> out to
>>> > our production five years ago :)  Here are the details:
>>> > *The Scheduling*
>>> > The scheduling itself is exceptionally lightweight, as only one
>>> additional
>>> > thread monitors the repair activity, updating the status to a system
>>> table
>>> > once every few minutes or so. So, it does not appear anywhere in the
>>> CPU
>>> > charts, etc. Unfortunately, I do not have those graphs now, but I can
>>> do a
>>> > quick comparison if it helps!
>>> >
>>> > *The Repair Itself*
>>> > As we all know, the Cassandra repair algorithm is a heavy-weight
>>> process
>>> > due to Merkle tree/streaming, etc., no matter how we schedule it. But
>>> it is
>>> > an orthogonal topic and folks are already discussing creating a new
>>> CEP.
>>> >
>>> > Jaydeep
>>> >
>>> >
>>> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero <
>>> fran...@apache.org>
>>> > wrote:
>>> >
>>> > > Jaydeep, do you have any metrics on your clusters comparing them
>>> before
>>> > > and after introducing repair scheduling into the Cassandra process?
>>> > >
>>> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
>>> > > > Sounds good. Just wanted to bring it up. I agree that the
>>> scheduling bit
>>> > > is
>>> > > > pretty light weight and the ideal would be to bring the whole of
>>> the
>>> > > repair
>>> > > > external, which is a much bigger can of worms to open.
>>> > > >
>>> > > >
>>> > > >
>>> > > > -Jeremiah
>>> > > >
>>> > > >
>>> > > >
>>> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <
>>> clohfin...@gmail.com>
>>> > > wrote:
>>> > > > >
>>> > > > >
>>> > > >
>>> > > > > 
>>> > > > >
>>> > > > > > I actually think we should be looking at how we can move
>>> things out
>>> > > of the
>>> > > > > database process.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > While worth pursuing, I think we would need a different CEP just
>>> to
>>> > > figure
>>> > > > > out how to do that. Not only is there a lot of infrastructure
>>> > > difficulty in
>>> > > > > running multi process, the inter app communication needs to be
>>> figured
>>> > > out
>>> > > > > better then JMX. Even the sidecar we dont have a solid story on
>>> how to
>>> > > > > ensure both are running or anything yet. It's up to each app
>>> owner to
>>> > > figure
>>> > > > > it out. Once we have a good thing in place I think we can start
>>> moving
>>> > > > > compactions, repairs, etc out of the database. Even then it's the
>>> > > _repairs_
>>> > > > > that is expensive, not the scheduling.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
>>> > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)>
>>> > > wrote:
>>> > > > >
>>> > > > >
>>> > > >
>>> > > > >> I love the idea of a repair service being there by default for
>>> an
>>> > > install
>>> > > > of C*.  My main concern here is that it is putting more services
>>> into
>>> > > the main
>>>

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Ekaterina Dimitrova
Honestly, counting the letters was also a thing that happened to me but I
should admit that even with CASSANDRAANALYTICS we count the As…

My preference is CASSX

Seems shorter and less painful to read to me as a user.

Thanks

On Tue, 22 Oct 2024 at 11:18, Patrick McFadin  wrote:

> CASS + NAME is my +1
>
> TBH rarely with this be typed. Just copied and pasted. It has to be
> clear that naming is different from the other projects and I think we
> get it either way.
>
> On Tue, Oct 22, 2024 at 8:15 AM Štefan Miklošovič
>  wrote:
> >
> > Something like this?
> >
> > CASSANDRA
> > CASSPYTHON
> > CASSGO
> > CASSJAVA
> > CASSSIDECAR
> > CASSANALYTICS
> >
> > if we expand it would be like
> >
> > CASSANDRA
> > CASSANDRAPYTHON
> > CASSANDRAGO
> > CASSANDRAJAVA
> > CASSANDRASIDECAR
> > CASSANDRAANALYTICS
> >
> > I don't know ... the first form seems fine to me but that triple S in
> CASSSIDECAR is strange. I just find myself counting S's when I type it.
> >
> > Up to you guys. I don't mind both.
> >
> > On Tue, Oct 22, 2024 at 5:01 PM Brandon Williams 
> wrote:
> >>
> >> I don't think underscore is an option from selfserve anyway.  If we
> >> have to stick everything together then I think having fewer things is
> >> better, so we could drop the 'driver' and just name things like
> >> CASSPYTHON.  WDYT?
> >>
> >> Kind Regards,
> >> Brandon
> >>
> >> On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
> >>  wrote:
> >> >
> >> > So we will have stuff like
> >> >
> >> > CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the
> commit messages will be like
> >> >
> >> > CASS_DRIVER_PYTHON-1234
> >> >
> >> > I checked (1) and there is not a single one which has underscores in
> its name, now THAT would be a precedent, wouldn't it ...
> >> >
> >> > (1)
> https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> >> >
> >> > On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha 
> wrote:
> >> >>
> >> >> This seems to be relevant documentation:
> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
> >> >>
> >> >> Martin
> >> >>
> >> >> 
> >> >> This email, including attached files, may contain confidential
> information and is intended only for the use of the individual and/or
> entity to which it is addressed. If you are not the intended recipient,
> disclosure, copying, use, or distribution of the information included in
> this email and/or in its attachments is prohibited.
> >> >> If you have received it by mistake, please do not read, copy or use
> it, or disclose its contents to others. Please notify the sender that you
> have received this email by mistake by replying to the email, and then
> delete the email and any copies and attachments of it. Thank you.
>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Patrick McFadin
CASS + NAME is my +1

TBH rarely with this be typed. Just copied and pasted. It has to be
clear that naming is different from the other projects and I think we
get it either way.

On Tue, Oct 22, 2024 at 8:15 AM Štefan Miklošovič
 wrote:
>
> Something like this?
>
> CASSANDRA
> CASSPYTHON
> CASSGO
> CASSJAVA
> CASSSIDECAR
> CASSANALYTICS
>
> if we expand it would be like
>
> CASSANDRA
> CASSANDRAPYTHON
> CASSANDRAGO
> CASSANDRAJAVA
> CASSANDRASIDECAR
> CASSANDRAANALYTICS
>
> I don't know ... the first form seems fine to me but that triple S in 
> CASSSIDECAR is strange. I just find myself counting S's when I type it.
>
> Up to you guys. I don't mind both.
>
> On Tue, Oct 22, 2024 at 5:01 PM Brandon Williams  wrote:
>>
>> I don't think underscore is an option from selfserve anyway.  If we
>> have to stick everything together then I think having fewer things is
>> better, so we could drop the 'driver' and just name things like
>> CASSPYTHON.  WDYT?
>>
>> Kind Regards,
>> Brandon
>>
>> On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
>>  wrote:
>> >
>> > So we will have stuff like
>> >
>> > CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the commit 
>> > messages will be like
>> >
>> > CASS_DRIVER_PYTHON-1234
>> >
>> > I checked (1) and there is not a single one which has underscores in its 
>> > name, now THAT would be a precedent, wouldn't it ...
>> >
>> > (1) 
>> > https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
>> >
>> > On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha  wrote:
>> >>
>> >> This seems to be relevant documentation: 
>> >> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
>> >>
>> >> Martin
>> >>
>> >> 
>> >> This email, including attached files, may contain confidential 
>> >> information and is intended only for the use of the individual and/or 
>> >> entity to which it is addressed. If you are not the intended recipient, 
>> >> disclosure, copying, use, or distribution of the information included in 
>> >> this email and/or in its attachments is prohibited.
>> >> If you have received it by mistake, please do not read, copy or use it, 
>> >> or disclose its contents to others. Please notify the sender that you 
>> >> have received this email by mistake by replying to the email, and then 
>> >> delete the email and any copies and attachments of it. Thank you.


Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-22 Thread Štefan Miklošovič
The problem is that when I do this minimal CQL which shows this feature:

CREATE TABLE ks.tb_copy LIKE ks.tb;

then you are saying that when I _do not_ specify WITH OPTIONS then I get
Cassandra's defaults. Only after I specify WITH OPTIONS, it would truly be
a copy.

This is not a good design. Because to have an exact copy, I have to make a
conscious effort to include OPTIONS as well. That should not be the case. I
just want to have a copy, totally the same stuff, when I use the minimal
version of that statement. It would be better to opt-out from options like

CREATE TABLE ks.tb_copy LIKE ks.tb WITHOUT OPTIONS (you feel me) but we do
not support this (yet).

On Tue, Oct 22, 2024 at 5:28 PM Štefan Miklošovič 
wrote:

> I just don't see OPTIONS as important. When I want to copy a table, I am
> copying a table _with everything_. Options included, by default. Why would
> I want to have a copy of a table with options different from the base one?
>
>
> On Mon, Oct 21, 2024 at 3:55 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi Guo,
>>
>> +1 for the CONSTRAINTS keyword to be added into the default behavior.
>>
>> Bernardo
>>
>> On Oct 21, 2024, at 12:01 AM, guo Maxwell  wrote:
>>
>> I think the CONSTRAINTS keyword  keyword may be in the same situation as
>> datamask.
>> Maybe it is better to include  constraints into  the default behavior of
>> table copy together with column name, column data type and data mask.
>>
>> guo Maxwell  于2024年10月21日周一 14:56写道:
>>
>>> To yifan :
>>> I don't mind adding the ALL keyword, and it has been updated into CEP.
>>>
>>> As all you can see, our original intention was that the grammar would
>>> not be too complicated, which is what I described in cep
>>> 
>>> .
>>> We gave up PG-related grammar, including INCLUDING/EXCLUDING and so on .
>>>
>>> guo Maxwell  于2024年10月21日周一 14:52写道:
>>>
 Hi ,
 To sefan :
 I may want to explain that if there is no OPTION keyword in the CQL
 statement, then the newly created table will only have the
 original table's  column name 、column type and data mask ,I think this is
 the most basic choice when copying tables to users.
 Then  we do some  addition, we can add original table's table options
 like compaction strategy/compress strategy、index and so on.

 Recently, I have also thought about the situation of CONSTRAINTS
 keyword. I think it is similar to data mask. Agree that it should be
 included in the basic options of  table copy (column name, column data type
 , column data mask and constraints).

 Dave Herrington  于2024年10月19日周六 01:15写道:

> It seems like a natural extension of the CREATE TABLE statement.
> Looking forward to using it in the future.
>
> -Dave
>
> On Thu, Oct 17, 2024 at 5:11 PM Štefan Miklošovič <
> smikloso...@apache.org> wrote:
>
>> Right?! Reads like English, the impact on the existing CQL is
>> minimal. One LIKE which basically needs to be there and keywords of 
>> logical
>> "components" which seamlessly integrate with WITH.
>>
>> I would _not_ use WITH CONSTRAINTS because constraints will be
>> inherently part of a table schema. It is not an "option". We can not
>> "opt-out" from them. Remember we are copying a table here so if a base 
>> one
>> has constraints, its copy will have them too. A user can subsequently
>> "ALTER" them.
>>
>> On Thu, Oct 17, 2024 at 5:31 PM Dave Herrington <
>> he...@rhinosource.com> wrote:
>>
>>> Basing it on CREATE TABLE, the BNF definition of the simple
>>> implementation would look something like this:
>>>
>>> create_table_statement::= CREATE TABLE [ IF NOT EXISTS ] table_name
>>> LIKE base_table_name
>>> [ WITH included_objects ] [ [ AND ] table_options ]
>>> table_options::= COMPACT STORAGE [ AND table_options ]
>>> | CLUSTERING ORDER BY '(' clustering_order ')'
>>> [ AND table_options ]  | options
>>> clustering_order::= column_name (ASC | DESC) ( ',' column_name (ASC
>>> | DESC) )*
>>> included_objects::= dependent_objects [ AND dependent_objects ]
>>> dependent_objects:= INDEXES | TRIGGERS | CONSTRAINTS | VIEWS
>>>
>>>
>>> CREATE TABLE [ IF NOT EXISTS ] [.] LIKE
>>> [.]
>>>   [ WITH [  ]
>>>   [ [ AND ] [  ] ]
>>>   [ [ AND ] CLUSTERING ORDER BY [  (ASC |
>>> DESC) ] ]
>>> ;
>>>
>>> Examples:
>>>
>>> -- Create base table:
>>> CREATE TABLE cycling.cyclist_name (
>>>   id UUID PRIMARY KEY,
>>>   lastname text,
>>>   firstname text
>>> );
>>>
>>> -- Create an exact copy of the base table, but do not create any
>>> dependent objects:
>>> CREATE TABLE cycling.cyclist_name2 LIKE cycling.cyclist_name;
>>>
>>> -- Create an exact copy with all dependent objects (constraints

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-22 Thread Štefan Miklošovič
I just don't see OPTIONS as important. When I want to copy a table, I am
copying a table _with everything_. Options included, by default. Why would
I want to have a copy of a table with options different from the base one?


On Mon, Oct 21, 2024 at 3:55 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi Guo,
>
> +1 for the CONSTRAINTS keyword to be added into the default behavior.
>
> Bernardo
>
> On Oct 21, 2024, at 12:01 AM, guo Maxwell  wrote:
>
> I think the CONSTRAINTS keyword  keyword may be in the same situation as
> datamask.
> Maybe it is better to include  constraints into  the default behavior of
> table copy together with column name, column data type and data mask.
>
> guo Maxwell  于2024年10月21日周一 14:56写道:
>
>> To yifan :
>> I don't mind adding the ALL keyword, and it has been updated into CEP.
>>
>> As all you can see, our original intention was that the grammar would not
>> be too complicated, which is what I described in cep
>> 
>> .
>> We gave up PG-related grammar, including INCLUDING/EXCLUDING and so on .
>>
>> guo Maxwell  于2024年10月21日周一 14:52写道:
>>
>>> Hi ,
>>> To sefan :
>>> I may want to explain that if there is no OPTION keyword in the CQL
>>> statement, then the newly created table will only have the
>>> original table's  column name 、column type and data mask ,I think this is
>>> the most basic choice when copying tables to users.
>>> Then  we do some  addition, we can add original table's table options
>>> like compaction strategy/compress strategy、index and so on.
>>>
>>> Recently, I have also thought about the situation of CONSTRAINTS
>>> keyword. I think it is similar to data mask. Agree that it should be
>>> included in the basic options of  table copy (column name, column data type
>>> , column data mask and constraints).
>>>
>>> Dave Herrington  于2024年10月19日周六 01:15写道:
>>>
 It seems like a natural extension of the CREATE TABLE statement.
 Looking forward to using it in the future.

 -Dave

 On Thu, Oct 17, 2024 at 5:11 PM Štefan Miklošovič <
 smikloso...@apache.org> wrote:

> Right?! Reads like English, the impact on the existing CQL is minimal.
> One LIKE which basically needs to be there and keywords of logical
> "components" which seamlessly integrate with WITH.
>
> I would _not_ use WITH CONSTRAINTS because constraints will be
> inherently part of a table schema. It is not an "option". We can not
> "opt-out" from them. Remember we are copying a table here so if a base one
> has constraints, its copy will have them too. A user can subsequently
> "ALTER" them.
>
> On Thu, Oct 17, 2024 at 5:31 PM Dave Herrington 
> wrote:
>
>> Basing it on CREATE TABLE, the BNF definition of the simple
>> implementation would look something like this:
>>
>> create_table_statement::= CREATE TABLE [ IF NOT EXISTS ] table_name
>> LIKE base_table_name
>> [ WITH included_objects ] [ [ AND ] table_options ]
>> table_options::= COMPACT STORAGE [ AND table_options ]
>> | CLUSTERING ORDER BY '(' clustering_order ')'
>> [ AND table_options ]  | options
>> clustering_order::= column_name (ASC | DESC) ( ',' column_name (ASC |
>> DESC) )*
>> included_objects::= dependent_objects [ AND dependent_objects ]
>> dependent_objects:= INDEXES | TRIGGERS | CONSTRAINTS | VIEWS
>>
>>
>> CREATE TABLE [ IF NOT EXISTS ] [.] LIKE
>> [.]
>>   [ WITH [  ]
>>   [ [ AND ] [  ] ]
>>   [ [ AND ] CLUSTERING ORDER BY [  (ASC |
>> DESC) ] ]
>> ;
>>
>> Examples:
>>
>> -- Create base table:
>> CREATE TABLE cycling.cyclist_name (
>>   id UUID PRIMARY KEY,
>>   lastname text,
>>   firstname text
>> );
>>
>> -- Create an exact copy of the base table, but do not create any
>> dependent objects:
>> CREATE TABLE cycling.cyclist_name2 LIKE cycling.cyclist_name;
>>
>> -- Create an exact copy with all dependent objects (constraints
>> excluded for now):
>> CREATE TABLE cycling.cyclist_name3 LIKE cycling.cyclist_name
>> WITH INDEXES AND TRIGGERS AND VIEWS;
>>
>> -- Create a copy with LCS compaction, a default TTL and all dependent
>> objects except indexes:
>> CREATE TABLE cycling.cyclist_name4 LIKE cycling.cyclist_name
>> WITH TRIGGERS AND VIEWS
>> AND compaction = { 'class' :  'LeveledCompactionStrategy' }
>> AND default_time_to_live = 86400;
>>
>>
>>
>> This seems pretty clean & straightforward.
>>
>> -Dave
>>
>> On Thu, Oct 17, 2024 at 4:05 PM Dave Herrington <
>> he...@rhinosource.com> wrote:
>>
>>> This simple approach resonates with me.  I think the Cassandra doc
>>> uses "INDEXES" as the plural for index, i.e.:
>>> https://cassandra.apache.org/doc/stable/cassandra/cql/indexes.html
>>>
>>> -Dave

Re: [DISCUSS] Donating easy-cass-stress to the project

2024-10-22 Thread Jordan West
Josh/Mick, where does that leave us? I’d like to start with the smaller
scope Josh described in his last email. We can tackle in-tree/stress
separately.

I was going to start working on getting signed ICLAs. Does that still sound
like the right next step? Or is that also not necessary unless we take the
approach originally described by Mick?

Jordan

On Tue, Oct 15, 2024 at 04:23 Josh McKenzie  wrote:

> IIUC there's no subproject involved here.
>
> To elaborate a touch: this isn't a subproject in terms of governance. i.e.
> no 3 dedicated PMC sponsors required, no "pmc must legally vote on
> releasing an artifact" (unless of course we start independently releasing
> artifacts for it as opposed to just pointing people at the repo and tags),
> and no real "we must have at least N people with a commit-bit and a focus
> on this area for it to be taken in".
>
> Makes sense given the context and purpose of the tool and keeping it
> lighter weight should make it easier for everyone to collaborate on it, to
> Mick's point - much like the ccm and dtest repos.
>
> On Tue, Oct 15, 2024, at 3:19 AM, Mick Semb Wever wrote:
>
>
> IIUC there's no subproject involved here.  This is a separate repository
> coming in, akin to cassandra-dtest (plus releases).
>
> The question wrt replacing cassandra-stress was only thinking about
> something down the road, to help smoke out stuff like compaction-stress.
> No suggestion implied that easy-cass-stress should be moved in-tree.
>
>
>
> On Tue, 15 Oct 2024 at 00:15, David Capwell  wrote:
>
> I think we should just accept easy-cass-stress as a subproject and go
> from there.  Replacing stress can be handled separately and still has
> the large issue of reconciling the build systems that I raised in the
> beginning of this thread, but can be figured out eventually.
>
>
> I strongly agree with you here. The proposal is to just add the project
>
>
> On Oct 14, 2024, at 11:08 AM, C. Scott Andreas 
> wrote:
>
> Separating the two is completely fine yep -- just mentioned since
> deprecation/removal of stress also came up in the thread.
>
> Let's complete the donation. Just wanted to make sure we don't remove
> compaction-stress without a substitute.
>
> – Scott
>
> On Oct 14, 2024, at 10:46 AM, Brandon Williams  wrote:
>
>
> I think we should just accept easy-cass-stress as a subproject and go
> from there. Replacing stress can be handled separately and still has
> the large issue of reconciling the build systems that I raised in the
> beginning of this thread, but can be figured out eventually.
>
> Kind Regards,
> Brandon
>
> On Mon, Oct 14, 2024 at 12:41 PM Jon Haddad 
> wrote:
>
>
> Scott, I think introducing replacing compaction stress as a requirement
> here adds unnecessary friction to the donation process. I'd prefer to avoid
> coupling the two things. Unless you or someone else is volunteering to
> rewrite it I think this would effectively halt the donation, which I doubt
> is your intention. Can we do that as a separate thing?
>
> Regarding the name, I'm fine if we rename it. My tooling is easy-cass-*,
> and renaming it would make it clear that it's no longer my project, that's
> fine with me.
>
> Jon
>
>
> On Sun, Oct 13, 2024 at 8:20 PM  wrote:
>
>
> Supportive and would welcome the contribution as well. Jon, thanks for
> your willingness to offer this work to the Foundation.
>
> Also supportive of considering easy-cass-stress the successor to
> cassandra-stress.
>
> I’m fine with a directional goal of deprecating and removing
> cassandra-stress, but would like to make sure we have a successor to
> compaction-stress before doing so. I very rarely use cassandra-stress, but
> compaction-stress is helpful for generating a large corpus of SSTables and
> allowing compaction to churn through them. This is great for benching
> changes to the read path, compaction strategies, and for evaluation of
> hardware/VM/IO performance.
>
>
> https://github.com/apache/cassandra/blob/trunk/tools/stress/src/org/apache/cassandra/stress/CompactionStress.java
>
> Apologies if this exists in easy-cass-stress today - I may have missed it.
> Our own documentation even lacks a mention of compaction-stress. :)
>
> – Scott
>
> On Oct 13, 2024, at 8:01 PM, Štefan Miklošovič 
> wrote:
>
> * easy-cass-stress, sorry. Everything else holds.
>
> On Sun, Oct 13, 2024 at 9:00 PM Štefan Miklošovič 
> wrote:
>
>
> What does "replacing" actually mean? If this tool is added to a separate
> repository, you mean like it would be put there under the "easy-cass-lab"
> name and all source code of cassandra-stress in the Cassandra repository
> would be removed? Are we going to deprecate what we have first or it is
> going to be a big bang?
>
> Should not be easy-cass-lab renamed to "cassandra-stress"? I do not think
> that "easy-cass-lab" should be the name of a repo we are going to use. For
> a custom tool living outside of Cassandra until now, sure, but the official
> stress tool should not be called "easy-cass-lab". Pe

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-10-22 Thread guo Maxwell
The reason for using OPTION keyword is that I want to provide users with
more choices .
The default behavior for copying a table is to copy the basic item of table
(column and their data type,mask,constraint),others thing belongs to the
table like option,views,trigger
are optional in my mind.
You are absolutely right that users may want to copy all stuff but I think
there are aslo some users want to do basic column information copy,So I
just give them a choice。As we know that the number of table parameters is
not small,compression,compaction,gc_seconds,bf_chance,speculative_retry and
so on.

Besides we can see that pg have also the keyword COMMENT,COMPRESSION which
have the similar behavior as our OPTION keyword。

So that is why I add this keyword OPTION.


Štefan Miklošovič 于2024年10月22日 周二下午11:40写道:

> The problem is that when I do this minimal CQL which shows this feature:
>
> CREATE TABLE ks.tb_copy LIKE ks.tb;
>
> then you are saying that when I _do not_ specify WITH OPTIONS then I get
> Cassandra's defaults. Only after I specify WITH OPTIONS, it would truly be
> a copy.
>
> This is not a good design. Because to have an exact copy, I have to make a
> conscious effort to include OPTIONS as well. That should not be the case. I
> just want to have a copy, totally the same stuff, when I use the minimal
> version of that statement. It would be better to opt-out from options like
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITHOUT OPTIONS (you feel me) but we do
> not support this (yet).
>
> On Tue, Oct 22, 2024 at 5:28 PM Štefan Miklošovič 
> wrote:
>
>> I just don't see OPTIONS as important. When I want to copy a table, I am
>> copying a table _with everything_. Options included, by default. Why would
>> I want to have a copy of a table with options different from the base one?
>>
>>
>> On Mon, Oct 21, 2024 at 3:55 PM Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>>
>>> Hi Guo,
>>>
>>> +1 for the CONSTRAINTS keyword to be added into the default behavior.
>>>
>>> Bernardo
>>>
>>> On Oct 21, 2024, at 12:01 AM, guo Maxwell  wrote:
>>>
>>> I think the CONSTRAINTS keyword  keyword may be in the same situation as
>>> datamask.
>>> Maybe it is better to include  constraints into  the default behavior of
>>> table copy together with column name, column data type and data mask.
>>>
>>> guo Maxwell  于2024年10月21日周一 14:56写道:
>>>
 To yifan :
 I don't mind adding the ALL keyword, and it has been updated into CEP.

 As all you can see, our original intention was that the grammar would
 not be too complicated, which is what I described in cep
 
 .
 We gave up PG-related grammar, including INCLUDING/EXCLUDING and so on
 .

 guo Maxwell  于2024年10月21日周一 14:52写道:

> Hi ,
> To sefan :
> I may want to explain that if there is no OPTION keyword in the CQL
> statement, then the newly created table will only have the
> original table's  column name 、column type and data mask ,I think this is
> the most basic choice when copying tables to users.
> Then  we do some  addition, we can add original table's table options
> like compaction strategy/compress strategy、index and so on.
>
> Recently, I have also thought about the situation of CONSTRAINTS
> keyword. I think it is similar to data mask. Agree that it should be
> included in the basic options of  table copy (column name, column data 
> type
> , column data mask and constraints).
>
> Dave Herrington  于2024年10月19日周六 01:15写道:
>
>> It seems like a natural extension of the CREATE TABLE statement.
>> Looking forward to using it in the future.
>>
>> -Dave
>>
>> On Thu, Oct 17, 2024 at 5:11 PM Štefan Miklošovič <
>> smikloso...@apache.org> wrote:
>>
>>> Right?! Reads like English, the impact on the existing CQL is
>>> minimal. One LIKE which basically needs to be there and keywords of 
>>> logical
>>> "components" which seamlessly integrate with WITH.
>>>
>>> I would _not_ use WITH CONSTRAINTS because constraints will be
>>> inherently part of a table schema. It is not an "option". We can not
>>> "opt-out" from them. Remember we are copying a table here so if a base 
>>> one
>>> has constraints, its copy will have them too. A user can subsequently
>>> "ALTER" them.
>>>
>>> On Thu, Oct 17, 2024 at 5:31 PM Dave Herrington <
>>> he...@rhinosource.com> wrote:
>>>
 Basing it on CREATE TABLE, the BNF definition of the simple
 implementation would look something like this:

 create_table_statement::= CREATE TABLE [ IF NOT EXISTS ] table_name
 LIKE base_table_name
 [ WITH included_objects ] [ [ AND ] table_options ]
 table_options::= COMPACT STORAGE [ AND table_options ]
 | CLUSTERING ORDER BY '(' clustering_order ')'

Re: CEP-32: Open-Telemetry integration

2024-10-22 Thread Michael Burman
Hi,

> I'd really, really like to see us ship a Prom compatible metrics endpoint
out of the box in C* that has low overhead.  All the current OSS metrics
exporters that I've seen have massive overhead.  I'm specifically looking
for sub-10s collection on clusters with a thousand nodes and 500+ tables.
That means going directly to DropWizard and skipping JMX.

This is what we're doing in the management-api metrics endpoint. We poll
the DropWizard metrics directly and then modify the values to a standard
Prometheus output. The design goals included fast performance and near-zero
GC load. I tested the implementation using 8000 tables and on my old laptop
I was able to read some ~24 million datapoints per second. At that point,
the constraints are on the network side (even with compression which we
support) and what Prometheus / Mimir / Thanos / etc are able to receive.

In reality, the TSDB is always going to be the limiting side, not what we
can parse from Cassandra. Since filtering on the Prometheus polling side
was too slow and would require first transferring all the data there, we
opted to implement also the replacements directive of Prometheus on the
server side, so one can for example filter out all the table metrics before
anything is transferred. That helps in our testing with the processing load
on the Prometheus side (since large amounts of metrics would overwhelm the
Prometheus).

The implementation is available here:

https://github.com/k8ssandra/management-api-for-apache-cassandra/tree/master/management-api-agent-common/src/main/java/io/k8ssandra/metrics

There's no need to use the rest of the management-api features if you don't
want to, simply deploy the agent to get access to this output (it will
answer on localhost:9000/metrics).

We do implement some extra metrics that are not available in Cassandra
DropWizard also (such as per compaction / per streaming process status),
but these are separated under namespace
org_apache_cassandra_metrics_extended_*

  - Micke

On Thu, 3 Oct 2024 at 17:59, Patrick McFadin  wrote:

> So. That's a +1 from you, Jon? Just want to make sure.
>
> On Thu, Oct 3, 2024 at 7:17 AM Jon Haddad  wrote:
>
>> I love that we're having a discussion about observability.  A HUGE thank
>> you to anyone willing to invest time improving it in Cassandra.
>>
>> I'd really, really like to see us ship a Prom compatible metrics endpoint
>> out of the box in C* that has low overhead.  All the current OSS metrics
>> exporters that I've seen have massive overhead.  I'm specifically looking
>> for sub-10s collection on clusters with a thousand nodes and 500+ tables.
>> That means going directly to DropWizard and skipping JMX.
>>
>> I put together a POC of it a while ago here:
>> https://github.com/rustyrazorblade/cassandra-prometheus-exporter.
>> Please use commit 434be099d5983d537e2c70aad745194e575bc49a as a reference.
>> I wasn't expecting anyone to actually care about the repo and the last
>> commit broke it.  There's some optimizations that could be done to further
>> improve the exporter, I was working on that when I broke the repo :/
>>
>> For industry comparison the following DBs either ship entire monitoring
>> stacks or provide strong recommendations / solutions:
>>
>> * ScyllaDB: https://www.scylladb.com/product/scylladb-monitoring-stack/
>> * Cockroach:
>> https://www.cockroachlabs.com/docs/v24.2/ui-overview-dashboard
>> * Aerospike:
>> https://aerospike.com/docs/monitorstack/new/components-of-monitoring-stack
>> * MongoDB:
>> https://www.mongodb.com/products/platform/atlas-charts/dashboard
>> * Elastic:
>> https://www.elastic.co/guide/en/elasticsearch/reference/8.15/monitoring-production.html
>> * Redis: https://grafana.com/grafana/dashboards/12776-redis/
>>
>> Re: Logs - I wouldn't write off OTel logging [1].  OTel logs can be
>> tagged with metadata including the span allowing you to do some really
>> useful diagnostics.  It's a significant improvement over standard logging.
>>
>> Anyways - I don't have a strong opinion on how the CEPs are done.
>> Different ones or together, whichever works.  I hope we can finally get a
>> good metrics solution because that's an area of significant pain for end
>> users.  A lot of teams don't even have Cassandra dashboards because we
>> currently provide zero direction.
>>
>> Jon
>>
>> [1] https://opentelemetry.io/docs/specs/otel/logs/
>>
>> Logs can be correlated with the rest of observability data in a few
>> dimensions:
>>
>> * By the time of execution. Logs, traces and metrics can record the
>> moment of time or the range of time the execution took place. This is the
>> most basic form of correlation.
>>
>>  * By the execution context, also known as the trace context. It is a
>> standard practice to record the execution context (trace and span ids as
>> well as user-defined context) in the spans. OpenTelemetry extends this
>> practice to logs where possible by including TraceId and SpanId in the
>> LogRecords. This allows to directl

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Patrick McFadin
I thought underscore was one of the allowed characters. Or it could be and
has just been restricted in the admin regex.

On Tue, Oct 22, 2024 at 6:53 AM Brandon Williams  wrote:

> It looks like I can create the subprojects myself with
> https://selfserve.apache.org but there is a small issue with the
> bikeshed: JIRA projects must be alphanumeric only.  So we can have
> CASSDRIVERPYTHON but not CASS-DRIVER-PYTHON.  I'm not a huge fan of
> everything stuck together like that, but maybe somebody is and I don't
> feel strongly enough to veto, so I'll open it back up to suggestions.
>
> Kind Regards,
> Brandon
>
> On Sun, Oct 20, 2024 at 12:51 PM Francisco Guerrero 
> wrote:
> >
> > Yeah +1 also to CASS-. I think it's widely understood in the
> community.
> >
> > On 2024/10/20 17:21:58 Jon Haddad wrote:
> > > Agreed. I think everyone involved with cassandra will recognize CASS.
> > > —
> > > Jon Haddad
> > > Rustyrazorblade Consulting
> > > rustyrazorblade.com
> > >
> > >
> > > On Sun, Oct 20, 2024 at 7:18 AM Josh McKenzie 
> wrote:
> > >
> > > > +1 to CASS- shorthand.
> > > >
> > > > Think you're going to just have to agree to disagree on this one
> Stefan;
> > > > clear majority consensus on it on this thread afaict.
> > > >
> > > > On Sat, Oct 19, 2024, at 8:55 AM, Mick Semb Wever wrote:
> > > >
> > > > Isn't it weird that you said that we should not save characters at
> the
> > > > cost of readability while we just use CASS everywhere except the main
> > > > project? Why do you think that having "CASS-" will make people
> > > > automatically think that this is Cassandra related. No other project
> in
> > > > Apache (afaik) makes the shortcuts like that.
> > > >
> > > >
> > > >
> > > > Browsing
> > > >
> https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> > > >
> > > >  there's plenty of shortcuts and abbreviations…
> > > >
> > > > AAR, ACL, AMQNET, APLO, AMQCPP, …
> > > >
> > > > I don't see any pattern or precedent there… :shrug:
> > > >
> > > > (bikeshedding)
> > > > So I'm entirely ok w/ the CASS shorthand, it's project-wide and
> > > > intuitive.  It also provides a pattern that clearly categorises us
> neatly
> > > > compared to other apache projects.
> > > >
> > > >
> > > >
> > >
>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Brandon Williams
Well, I don't see any TLPs with anything but alphanumeric, and only
one of those contains a number.

Kind Regards,
Brandon

On Tue, Oct 22, 2024 at 9:12 AM Patrick McFadin  wrote:
>
> I thought underscore was one of the allowed characters. Or it could be and 
> has just been restricted in the admin regex.
>
> On Tue, Oct 22, 2024 at 6:53 AM Brandon Williams  wrote:
>>
>> It looks like I can create the subprojects myself with
>> https://selfserve.apache.org but there is a small issue with the
>> bikeshed: JIRA projects must be alphanumeric only.  So we can have
>> CASSDRIVERPYTHON but not CASS-DRIVER-PYTHON.  I'm not a huge fan of
>> everything stuck together like that, but maybe somebody is and I don't
>> feel strongly enough to veto, so I'll open it back up to suggestions.
>>
>> Kind Regards,
>> Brandon
>>
>> On Sun, Oct 20, 2024 at 12:51 PM Francisco Guerrero  
>> wrote:
>> >
>> > Yeah +1 also to CASS-. I think it's widely understood in the community.
>> >
>> > On 2024/10/20 17:21:58 Jon Haddad wrote:
>> > > Agreed. I think everyone involved with cassandra will recognize CASS.
>> > > —
>> > > Jon Haddad
>> > > Rustyrazorblade Consulting
>> > > rustyrazorblade.com
>> > >
>> > >
>> > > On Sun, Oct 20, 2024 at 7:18 AM Josh McKenzie  
>> > > wrote:
>> > >
>> > > > +1 to CASS- shorthand.
>> > > >
>> > > > Think you're going to just have to agree to disagree on this one 
>> > > > Stefan;
>> > > > clear majority consensus on it on this thread afaict.
>> > > >
>> > > > On Sat, Oct 19, 2024, at 8:55 AM, Mick Semb Wever wrote:
>> > > >
>> > > > Isn't it weird that you said that we should not save characters at the
>> > > > cost of readability while we just use CASS everywhere except the main
>> > > > project? Why do you think that having "CASS-" will make people
>> > > > automatically think that this is Cassandra related. No other project in
>> > > > Apache (afaik) makes the shortcuts like that.
>> > > >
>> > > >
>> > > >
>> > > > Browsing
>> > > > https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
>> > > >
>> > > >  there's plenty of shortcuts and abbreviations…
>> > > >
>> > > > AAR, ACL, AMQNET, APLO, AMQCPP, …
>> > > >
>> > > > I don't see any pattern or precedent there… :shrug:
>> > > >
>> > > > (bikeshedding)
>> > > > So I'm entirely ok w/ the CASS shorthand, it's project-wide and
>> > > > intuitive.  It also provides a pattern that clearly categorises us 
>> > > > neatly
>> > > > compared to other apache projects.
>> > > >
>> > > >
>> > > >
>> > >


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Martin Sucha
This seems to be relevant documentation:
https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html

Martin

-- 
This email, including attached files, may contain confidential information 
and is intended only for the use of the individual and/or entity to which 
it is addressed. If you are not the intended recipient, disclosure, 
copying, use, or distribution of the information included in this email 
and/or in its attachments is prohibited.
If you have received it by 
mistake, please do not read, copy or use it, or disclose its contents to 
others. Please notify the sender that you have received this email by 
mistake by replying to the email, and then delete the email and any copies 
and attachments of it. Thank you.


Re: [Discuss] Repair inside C*

2024-10-22 Thread Joseph Lynch
Definitely like this in C* itself. We only changed our proposal to putting
repair scheduling in the sidecar before because trunk was frozen for the
foreseeable future at that time. With trunk unfrozen and development on the
main process going at a fast pace I think it makes way more sense to
integrate natively as table properties as this CEP proposes. Completely
agree the scheduling overhead should be minimal.

Moving the actual repair operation (comparing data and streaming
mismatches) along with compaction operations to a separate process long
term makes a lot of sense but imo only once we both have a release of
sidecar and a contract figured out between them on communication. I'm
watching CEP-38 there as I think CQL and virtual tables are looking much
stronger than when we wrote CEP-1 and chose HTTP but that's for that
discussion and not this one.

-Joey

On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
wrote:

> Like others have said, I was expecting the scheduling portion of repair is
> negligible. I was mostly curious if you had something handy that you can
> quickly share.
>
> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> > >Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > Yes, I had made some comparisons when I started rolling this feature out
> to
> > our production five years ago :)  Here are the details:
> > *The Scheduling*
> > The scheduling itself is exceptionally lightweight, as only one
> additional
> > thread monitors the repair activity, updating the status to a system
> table
> > once every few minutes or so. So, it does not appear anywhere in the CPU
> > charts, etc. Unfortunately, I do not have those graphs now, but I can do
> a
> > quick comparison if it helps!
> >
> > *The Repair Itself*
> > As we all know, the Cassandra repair algorithm is a heavy-weight process
> > due to Merkle tree/streaming, etc., no matter how we schedule it. But it
> is
> > an orthogonal topic and folks are already discussing creating a new CEP.
> >
> > Jaydeep
> >
> >
> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
> > wrote:
> >
> > > Jaydeep, do you have any metrics on your clusters comparing them before
> > > and after introducing repair scheduling into the Cassandra process?
> > >
> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > > Sounds good. Just wanted to bring it up. I agree that the scheduling
> bit
> > > is
> > > > pretty light weight and the ideal would be to bring the whole of the
> > > repair
> > > > external, which is a much bigger can of worms to open.
> > > >
> > > >
> > > >
> > > > -Jeremiah
> > > >
> > > >
> > > >
> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> > > wrote:
> > > > >
> > > > >
> > > >
> > > > > 
> > > > >
> > > > > > I actually think we should be looking at how we can move things
> out
> > > of the
> > > > > database process.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > While worth pursuing, I think we would need a different CEP just to
> > > figure
> > > > > out how to do that. Not only is there a lot of infrastructure
> > > difficulty in
> > > > > running multi process, the inter app communication needs to be
> figured
> > > out
> > > > > better then JMX. Even the sidecar we dont have a solid story on
> how to
> > > > > ensure both are running or anything yet. It's up to each app owner
> to
> > > figure
> > > > > it out. Once we have a good thing in place I think we can start
> moving
> > > > > compactions, repairs, etc out of the database. Even then it's the
> > > _repairs_
> > > > > that is expensive, not the scheduling.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)>
> > > wrote:
> > > > >
> > > > >
> > > >
> > > > >> I love the idea of a repair service being there by default for an
> > > install
> > > > of C*.  My main concern here is that it is putting more services into
> > > the main
> > > > database process.  I actually think we should be looking at how we
> can
> > > move
> > > > things out of the database process.  The C* process being a giant
> > > monolith has
> > > > always been a pain point.  Is there anyway it makes sense for this
> to be
> > > an
> > > > external process rather than a new thread pool inside the C* process?
> > > >
> > > > >>
> > > >
> > > > >>
> > > > >
> > > > >>
> > > >
> > > > >> -Jeremiah Jordan
> > > >
> > > > >>
> > > >
> > > > >>
> > > > >
> > > > >>
> > > >
> > > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> > > > <[m...@apache.org](mailto:m...@apache.org)> wrote:
> > > > >
> > > > >>
> > > >
> > > > >>>
> > > > >
> > > > >>>
> > > >
> > > > >>> This is looking strong, thanks Jaydeep.
> > > >
> > > > >>>
> > > >
> > > > >>>
> > > > >
> > > > >>>
> > > >
> > > > >>> I would suggest folk take a look at the design doc and the PR in
> the
> > > CEP.
> > > > A l

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Brandon Williams
It looks like I can create the subprojects myself with
https://selfserve.apache.org but there is a small issue with the
bikeshed: JIRA projects must be alphanumeric only.  So we can have
CASSDRIVERPYTHON but not CASS-DRIVER-PYTHON.  I'm not a huge fan of
everything stuck together like that, but maybe somebody is and I don't
feel strongly enough to veto, so I'll open it back up to suggestions.

Kind Regards,
Brandon

On Sun, Oct 20, 2024 at 12:51 PM Francisco Guerrero  wrote:
>
> Yeah +1 also to CASS-. I think it's widely understood in the community.
>
> On 2024/10/20 17:21:58 Jon Haddad wrote:
> > Agreed. I think everyone involved with cassandra will recognize CASS.
> > —
> > Jon Haddad
> > Rustyrazorblade Consulting
> > rustyrazorblade.com
> >
> >
> > On Sun, Oct 20, 2024 at 7:18 AM Josh McKenzie  wrote:
> >
> > > +1 to CASS- shorthand.
> > >
> > > Think you're going to just have to agree to disagree on this one Stefan;
> > > clear majority consensus on it on this thread afaict.
> > >
> > > On Sat, Oct 19, 2024, at 8:55 AM, Mick Semb Wever wrote:
> > >
> > > Isn't it weird that you said that we should not save characters at the
> > > cost of readability while we just use CASS everywhere except the main
> > > project? Why do you think that having "CASS-" will make people
> > > automatically think that this is Cassandra related. No other project in
> > > Apache (afaik) makes the shortcuts like that.
> > >
> > >
> > >
> > > Browsing
> > > https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> > >
> > >  there's plenty of shortcuts and abbreviations…
> > >
> > > AAR, ACL, AMQNET, APLO, AMQCPP, …
> > >
> > > I don't see any pattern or precedent there… :shrug:
> > >
> > > (bikeshedding)
> > > So I'm entirely ok w/ the CASS shorthand, it's project-wide and
> > > intuitive.  It also provides a pattern that clearly categorises us neatly
> > > compared to other apache projects.
> > >
> > >
> > >
> >


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Štefan Miklošovič
So we will have stuff like

CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the commit
messages will be like

CASS_DRIVER_PYTHON-1234

I checked (1) and there is not a single one which has underscores in its
name, now THAT would be a precedent, wouldn't it ...

(1)
https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all


On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha  wrote:

> This seems to be relevant documentation:
> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
>
> Martin
>
> --
> This email, including attached files, may contain confidential information
> and is intended only for the use of the individual and/or entity to which
> it is addressed. If you are not the intended recipient, disclosure,
> copying, use, or distribution of the information included in this email
> and/or in its attachments is prohibited.
> If you have received it by mistake, please do not read, copy or use it, or
> disclose its contents to others. Please notify the sender that you have
> received this email by mistake by replying to the email, and then delete
> the email and any copies and attachments of it. Thank you.


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Brandon Williams
I don't think underscore is an option from selfserve anyway.  If we
have to stick everything together then I think having fewer things is
better, so we could drop the 'driver' and just name things like
CASSPYTHON.  WDYT?

Kind Regards,
Brandon

On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
 wrote:
>
> So we will have stuff like
>
> CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the commit 
> messages will be like
>
> CASS_DRIVER_PYTHON-1234
>
> I checked (1) and there is not a single one which has underscores in its 
> name, now THAT would be a precedent, wouldn't it ...
>
> (1) 
> https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
>
> On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha  wrote:
>>
>> This seems to be relevant documentation: 
>> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
>>
>> Martin
>>
>> 
>> This email, including attached files, may contain confidential information 
>> and is intended only for the use of the individual and/or entity to which it 
>> is addressed. If you are not the intended recipient, disclosure, copying, 
>> use, or distribution of the information included in this email and/or in its 
>> attachments is prohibited.
>> If you have received it by mistake, please do not read, copy or use it, or 
>> disclose its contents to others. Please notify the sender that you have 
>> received this email by mistake by replying to the email, and then delete the 
>> email and any copies and attachments of it. Thank you.


Re: [Discuss] Repair inside C*

2024-10-22 Thread Dinesh Joshi
Decomposing Cassandra may be architecturally desirable but that is not the
goal of this CEP. This CEP brings value to operators today so it should be
considered on that merit. We definitely need to have a separate
conversation on Cassandra's architectural direction.

On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch  wrote:

> Definitely like this in C* itself. We only changed our proposal to putting
> repair scheduling in the sidecar before because trunk was frozen for the
> foreseeable future at that time. With trunk unfrozen and development on the
> main process going at a fast pace I think it makes way more sense to
> integrate natively as table properties as this CEP proposes. Completely
> agree the scheduling overhead should be minimal.
>
> Moving the actual repair operation (comparing data and streaming
> mismatches) along with compaction operations to a separate process long
> term makes a lot of sense but imo only once we both have a release of
> sidecar and a contract figured out between them on communication. I'm
> watching CEP-38 there as I think CQL and virtual tables are looking much
> stronger than when we wrote CEP-1 and chose HTTP but that's for that
> discussion and not this one.
>
> -Joey
>
> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
> wrote:
>
>> Like others have said, I was expecting the scheduling portion of repair is
>> negligible. I was mostly curious if you had something handy that you can
>> quickly share.
>>
>> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
>> > >Jaydeep, do you have any metrics on your clusters comparing them before
>> > and after introducing repair scheduling into the Cassandra process?
>> >
>> > Yes, I had made some comparisons when I started rolling this feature
>> out to
>> > our production five years ago :)  Here are the details:
>> > *The Scheduling*
>> > The scheduling itself is exceptionally lightweight, as only one
>> additional
>> > thread monitors the repair activity, updating the status to a system
>> table
>> > once every few minutes or so. So, it does not appear anywhere in the CPU
>> > charts, etc. Unfortunately, I do not have those graphs now, but I can
>> do a
>> > quick comparison if it helps!
>> >
>> > *The Repair Itself*
>> > As we all know, the Cassandra repair algorithm is a heavy-weight process
>> > due to Merkle tree/streaming, etc., no matter how we schedule it. But
>> it is
>> > an orthogonal topic and folks are already discussing creating a new CEP.
>> >
>> > Jaydeep
>> >
>> >
>> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero > >
>> > wrote:
>> >
>> > > Jaydeep, do you have any metrics on your clusters comparing them
>> before
>> > > and after introducing repair scheduling into the Cassandra process?
>> > >
>> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
>> > > > Sounds good. Just wanted to bring it up. I agree that the
>> scheduling bit
>> > > is
>> > > > pretty light weight and the ideal would be to bring the whole of the
>> > > repair
>> > > > external, which is a much bigger can of worms to open.
>> > > >
>> > > >
>> > > >
>> > > > -Jeremiah
>> > > >
>> > > >
>> > > >
>> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink > >
>> > > wrote:
>> > > > >
>> > > > >
>> > > >
>> > > > > 
>> > > > >
>> > > > > > I actually think we should be looking at how we can move things
>> out
>> > > of the
>> > > > > database process.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > While worth pursuing, I think we would need a different CEP just
>> to
>> > > figure
>> > > > > out how to do that. Not only is there a lot of infrastructure
>> > > difficulty in
>> > > > > running multi process, the inter app communication needs to be
>> figured
>> > > out
>> > > > > better then JMX. Even the sidecar we dont have a solid story on
>> how to
>> > > > > ensure both are running or anything yet. It's up to each app
>> owner to
>> > > figure
>> > > > > it out. Once we have a good thing in place I think we can start
>> moving
>> > > > > compactions, repairs, etc out of the database. Even then it's the
>> > > _repairs_
>> > > > > that is expensive, not the scheduling.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
>> > > > > <[jeremiah.jor...@gmail.com](mailto:jeremiah.jor...@gmail.com)>
>> > > wrote:
>> > > > >
>> > > > >
>> > > >
>> > > > >> I love the idea of a repair service being there by default for an
>> > > install
>> > > > of C*.  My main concern here is that it is putting more services
>> into
>> > > the main
>> > > > database process.  I actually think we should be looking at how we
>> can
>> > > move
>> > > > things out of the database process.  The C* process being a giant
>> > > monolith has
>> > > > always been a pain point.  Is there anyway it makes sense for this
>> to be
>> > > an
>> > > > external process rather than a new thread pool inside the C*
>> process?
>> > > >
>> > > > >>
>> > > >
>> > > > >>
>> > > > >
>> > > > >>
>> > > >
>> > > > >> -Jeremiah

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Štefan Miklošovič
Something like this?

CASSANDRA
CASSPYTHON
CASSGO
CASSJAVA
CASSSIDECAR
CASSANALYTICS

if we expand it would be like

CASSANDRA
CASSANDRAPYTHON
CASSANDRAGO
CASSANDRAJAVA
CASSANDRASIDECAR
CASSANDRAANALYTICS

I don't know ... the first form seems fine to me but that triple S in
CASSSIDECAR is strange. I just find myself counting S's when I type it.

Up to you guys. I don't mind both.

On Tue, Oct 22, 2024 at 5:01 PM Brandon Williams  wrote:

> I don't think underscore is an option from selfserve anyway.  If we
> have to stick everything together then I think having fewer things is
> better, so we could drop the 'driver' and just name things like
> CASSPYTHON.  WDYT?
>
> Kind Regards,
> Brandon
>
> On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
>  wrote:
> >
> > So we will have stuff like
> >
> > CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the
> commit messages will be like
> >
> > CASS_DRIVER_PYTHON-1234
> >
> > I checked (1) and there is not a single one which has underscores in its
> name, now THAT would be a precedent, wouldn't it ...
> >
> > (1)
> https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> >
> > On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha 
> wrote:
> >>
> >> This seems to be relevant documentation:
> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
> >>
> >> Martin
> >>
> >> 
> >> This email, including attached files, may contain confidential
> information and is intended only for the use of the individual and/or
> entity to which it is addressed. If you are not the intended recipient,
> disclosure, copying, use, or distribution of the information included in
> this email and/or in its attachments is prohibited.
> >> If you have received it by mistake, please do not read, copy or use it,
> or disclose its contents to others. Please notify the sender that you have
> received this email by mistake by replying to the email, and then delete
> the email and any copies and attachments of it. Thank you.
>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-22 Thread Patrick McFadin
That seems reasonable for me.

On Tue, Oct 22, 2024 at 8:01 AM Brandon Williams  wrote:
>
> I don't think underscore is an option from selfserve anyway.  If we
> have to stick everything together then I think having fewer things is
> better, so we could drop the 'driver' and just name things like
> CASSPYTHON.  WDYT?
>
> Kind Regards,
> Brandon
>
> On Tue, Oct 22, 2024 at 9:33 AM Štefan Miklošovič
>  wrote:
> >
> > So we will have stuff like
> >
> > CASS_DRIVER_PYTHON and all tickets in CHANGES.txt as well as in the commit 
> > messages will be like
> >
> > CASS_DRIVER_PYTHON-1234
> >
> > I checked (1) and there is not a single one which has underscores in its 
> > name, now THAT would be a precedent, wouldn't it ...
> >
> > (1) 
> > https://issues.apache.org/jira/secure/BrowseProjects.jspa?selectedCategory=all&selectedProjectType=all
> >
> > On Tue, Oct 22, 2024 at 4:17 PM Martin Sucha  wrote:
> >>
> >> This seems to be relevant documentation: 
> >> https://confluence.atlassian.com/adminjiraserver/changing-the-project-key-format-938847081.html
> >>
> >> Martin
> >>
> >> 
> >> This email, including attached files, may contain confidential information 
> >> and is intended only for the use of the individual and/or entity to which 
> >> it is addressed. If you are not the intended recipient, disclosure, 
> >> copying, use, or distribution of the information included in this email 
> >> and/or in its attachments is prohibited.
> >> If you have received it by mistake, please do not read, copy or use it, or 
> >> disclose its contents to others. Please notify the sender that you have 
> >> received this email by mistake by replying to the email, and then delete 
> >> the email and any copies and attachments of it. Thank you.


Re: [Discuss] Repair inside C*

2024-10-22 Thread Dinesh Joshi
On Mon, Oct 21, 2024 at 9:18 AM David Capwell  wrote:

> One thing to keep in mind is that larger clusters require you “smartly”
> split the ranges else you nuke your cluster… knowing how to split requires
> internal knowledge from the database which we could expose, but then we
> need to expose a new public API (most likely a set of APIs) just to do
> this.  When you do the scheduling internal to the database you can make
> “breaking” changes that improve stability into a patch fix rather than have
> to wait for the next major…
>

As the project and its ecosystem grows we need to have a conversation on
what is a public API? I do not want to derail this thread but very briefly,
we should make a distinction between `project internal` private API that is
exposed to Cassandra's components (which very well could run as a separate
local or remote process) and public API that the rest of the world outside
of the project uses. The backward compatibility expectations will be
different for `project internal` private API and public API.