date:20240919

Wow this is quite a rabbit hole.

What is ultimately going to be written into Chronicle Queue is what
writeMarshallablePayload method on AbstractLogQuery puts into that WireOut.
If we take e.g. QUERY_OPTIONS into consideration, then it writes it into
queryOptionsBuffer which is populated in AbstractLogEntry's constructor
(QueryOptions.codec.encode).

That takes QueryOptions, deserialized stuff we got from QueryMessage via
codec in its decode method, and it encodes it back to ByteBuf. So for now,
we just serialize what we deserialized all over again.

But for what reason do we need to serialize it again upon logging it to
FQL? "body" here which is used for decoding bytes to QueryOptions is
ByteBuf already. So if we go to write bytes then well here we have it. It
does not seem to be necessary to decode / encode, just use these bytes as
they are?

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L51

On Thu, Sep 19, 2024 at 11:50 PM Benedict Elliott Smith 
wrote:

> Well, that looks like item number one to fix when we change the
> serialisation format. We should clearly not duplicate query strings we have
> recently logged.
>
> We do however appear to also serialise the bind variables, which benefit
> from being in the format we already have available in memory.
>
> On 19 Sep 2024, at 22:26, Štefan Miklošovič 
> wrote:
>
> I am not sure what you mean. I mean, I do, but not following. Look into
> FullQueryLogger (1) what it goes to put into CQL is a query like String
> wrapped in a Query object. It literally take a String as a representation
> of a query a user executed. We just replace this by serializing that query
> to protobuf. What is counter productive? We just replace one thing for
> another. Audit message / events would be similar.
>
> (1)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321
>
> On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan 
> wrote:
>
>> I think reserializing the payload into a new format is counter productive
>> to some of the performance goals of the binary logs?
>> If you have to deserialize and reserialize the message you are going to
>> be throwing off a ton of extra GC.
>> I think we have done a lot of work in recent version to reduce the amount
>> of re-serialization that happens in the query paths?  Not sure we want to
>> add some back in on purpose?  Keeping the payload in the internal
>> serialization format does indeed have the drawbacks David mentioned, but I
>> think “zero serialization overhead” is a pretty big advantage to keeping
>> things that way?
>>
>> -Jeremiah
>>
>> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič 
>> wrote:
>>
>> 
>> I think protobuf upon serialization is just a bunch of bytes anyway. If
>> we figure out the header as David showed then we can still serialize it all
>> with the machinery / serializers you mentioned. It can write bytes, right?!
>> I very briefly checked and I think that protobuf is super simple and does
>> not have any checksumming etc. so some sauce on top of that would be
>> necessary anyway and we can reuse what we have to produce binary files.
>>
>> On the consumer side, the binary file would be parsed with some tooling
>> e.g. in Go, indeed, but the headers and stuff would be so simple that it
>> would be just a coding exercise and then it might be deserialized with
>> protobuf for that language.
>>
>> Basically, only the payload itself would be the product of protobuf and
>> all around super simple to crack through.
>>
>> On Thu, Sep 19, 2024 at 10:41 PM Benedict  wrote:
>>
>>> Sorry, I missed that. I’m not convinced any of these logs need language
>>> agnostic tools for access, but if that’s a goal for other folk I don’t feel
>>> strongly about it.
>>>
>>> On 19 Sep 2024, at 21:06, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> More to it, it is actually not only about FQL. Audit logging is on
>>> Chronicle queues too so inspecting that would be platform independent as
>>> well.
>>>
>>> CEP-12 suggests that there might be a persistent store for diagnostic
>>> events as well. If somebody wants to inspect what a node was doing after it
>>> went offline as for now all these events are in memory only.
>>>
>>> This would basically enable people to fully inspect what the cluster was
>>> doing from FQL to Audit to Diagnostics in a language independent manner.
>>>
>>> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič <
>>> smikloso...@apache.org> wrote:
>>>
 I think the biggest selling point for using something like protobuf is
 what David said - what if he wants to replay it in Go? Basing it on
 something language neutral enables people to replay it in whatever they
 want. If we have something totally custom then it is replayable just in
 Java without bringing tons of dependencies to their projects. That is the
 message I got from what he wrote.

 On Thu, Sep 19, 2024 at 9:47 PM B

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2024-09-19 Thread guo Maxwell

I discussed this offline with Claude, he is no longer working on this.

It's a pity. I think this is a very valuable thing. Commitlog's archiving
and restore may be able to use the relevant code if it is completed.

Patrick McFadin 于2024年9月20日 周五上午2:01写道：

> Thanks for reviving this one!
>
> On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell  wrote:
>
>> Is there any update on this topic?  It seems that things can make a big
>> progress if  Jake Luciani  can find someone who can make the
>> FileSystemProvider code accessible.
>>
>> Jon Haddad  于2023年12月16日周六 05:29写道：
>>
>>> At a high level I really like the idea of being able to better leverage
>>> cheaper storage especially object stores like S3.
>>>
>>> One important thing though - I feel pretty strongly that there's a big,
>>> deal breaking downside.   Backups, disk failure policies, snapshots and
>>> possibly repairs would get more complicated which haven't been particularly
>>> great in the past, and of course there's the issue of failure recovery
>>> being only partially possible if you're looking at a durable block store
>>> paired with an ephemeral one with some of your data not replicated to the
>>> cold side.  That introduces a failure case that's unacceptable for most
>>> teams, which results in needing to implement potentially 2 different backup
>>> solutions.  This is operationally complex with a lot of surface area for
>>> headaches.  I think a lot of teams would probably have an issue with the
>>> big question mark around durability and I probably would avoid it myself.
>>>
>>> On the other hand, I'm +1 if we approach it something slightly
>>> differently - where _all_ the data is located on the cold storage, with the
>>> local hot storage used as a cache.  This means we can use the cold
>>> directories for the complete dataset, simplifying backups and node
>>> replacements.
>>>
>>> For a little background, we had a ticket several years ago where I
>>> pointed out it was possible to do this *today* at the operating system
>>> level as long as you're using block devices (vs an object store) and LVM
>>> [1].  For example, this works well with GP3 EBS w/ low IOPS provisioning +
>>> local NVMe to get a nice balance of great read performance without going
>>> nuts on the cost for IOPS.  I also wrote about this in a little more detail
>>> in my blog [2].  There's also the new mount point tech in AWS which pretty
>>> much does exactly what I've suggested above [3] that's probably worth
>>> evaluating just to get a feel for it.
>>>
>>> I'm not insisting we require LVM or the AWS S3 fs, since that would rule
>>> out other cloud providers, but I am pretty confident that the entire
>>> dataset should reside in the "cold" side of things for the practical and
>>> technical reasons I listed above.  I don't think it massively changes the
>>> proposal, and should simplify things for everyone.
>>>
>>> Jon
>>>
>>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460
>>> [3]
>>> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
>>>
>>>
>>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren  wrote:
>>>
 Is there still interest in this?  Can we get some points down on
 electrons so that we all understand the issues?

 While it is fairly simple to redirect the read/write to something
 other  than the local system for a single node this will not solve the
 problem for tiered storage.

 Tiered storage will require that on read/write the primary key be
 assessed and determine if the read/write should be redirected.  My
 reasoning for this statement is that in a cluster with a replication factor
 greater than 1 the node will store data for the keys that would be
 allocated to it in a cluster with a replication factor = 1, as well as some
 keys from nodes earlier in the ring.

 Even if we can get the primary keys for all the data we want to write
 to "cold storage" to map to a single node a replication factor > 1 means
 that data will also be placed in "normal storage" on subsequent nodes.

 To overcome this, we have to explore ways to route data to different
 storage based on the keys and that different storage may have to be
 available on _all_  the nodes.

 Have any of the partial solutions mentioned in this email chain (or
 others) solved this problem?

 Claude

>>>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread Benedict Elliott Smith

Well, that looks like item number one to fix when we change the serialisation 
format. We should clearly not duplicate query strings we have recently logged.

We do however appear to also serialise the bind variables, which benefit from 
being in the format we already have available in memory.

> On 19 Sep 2024, at 22:26, Štefan Miklošovič  wrote:
> 
> I am not sure what you mean. I mean, I do, but not following. Look into 
> FullQueryLogger (1) what it goes to put into CQL is a query like String 
> wrapped in a Query object. It literally take a String as a representation of 
> a query a user executed. We just replace this by serializing that query to 
> protobuf. What is counter productive? We just replace one thing for another. 
> Audit message / events would be similar. 
> 
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321
> 
> On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan  > wrote:
>> I think reserializing the payload into a new format is counter productive to 
>> some of the performance goals of the binary logs?
>> If you have to deserialize and reserialize the message you are going to be 
>> throwing off a ton of extra GC.
>> I think we have done a lot of work in recent version to reduce the amount of 
>> re-serialization that happens in the query paths?  Not sure we want to add 
>> some back in on purpose?  Keeping the payload in the internal serialization 
>> format does indeed have the drawbacks David mentioned, but I think “zero 
>> serialization overhead” is a pretty big advantage to keeping things that way?
>> 
>> -Jeremiah
>> 
>>> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič >> > wrote:
>>> 
>>> 
>>> I think protobuf upon serialization is just a bunch of bytes anyway. If we 
>>> figure out the header as David showed then we can still serialize it all 
>>> with the machinery / serializers you mentioned. It can write bytes, right?! 
>>> I very briefly checked and I think that protobuf is super simple and does 
>>> not have any checksumming etc. so some sauce on top of that would be 
>>> necessary anyway and we can reuse what we have to produce binary files.
>>> 
>>> On the consumer side, the binary file would be parsed with some tooling 
>>> e.g. in Go, indeed, but the headers and stuff would be so simple that it 
>>> would be just a coding exercise and then it might be deserialized with 
>>> protobuf for that language.
>>> 
>>> Basically, only the payload itself would be the product of protobuf and all 
>>> around super simple to crack through.
>>> 
>>> On Thu, Sep 19, 2024 at 10:41 PM Benedict >> > wrote:
 Sorry, I missed that. I’m not convinced any of these logs need language 
 agnostic tools for access, but if that’s a goal for other folk I don’t 
 feel strongly about it.
 
> On 19 Sep 2024, at 21:06, Štefan Miklošovič  > wrote:
> 
> 
> More to it, it is actually not only about FQL. Audit logging is on 
> Chronicle queues too so inspecting that would be platform independent as 
> well. 
> 
> CEP-12 suggests that there might be a persistent store for diagnostic 
> events as well. If somebody wants to inspect what a node was doing after 
> it went offline as for now all these events are in memory only.
> 
> This would basically enable people to fully inspect what the cluster was 
> doing from FQL to Audit to Diagnostics in a language independent manner. 
> 
> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič  > wrote:
>> I think the biggest selling point for using something like protobuf is 
>> what David said - what if he wants to replay it in Go? Basing it on 
>> something language neutral enables people to replay it in whatever they 
>> want. If we have something totally custom then it is replayable just in 
>> Java without bringing tons of dependencies to their projects. That is 
>> the message I got from what he wrote. 
>> 
>> On Thu, Sep 19, 2024 at 9:47 PM Benedict > > wrote:
>>> Do we need any of these things either? We have our own serialisation 
>>> framework and file readers and writers, and at least in the FQL case 
>>> these are the native serialisation format. 
>>> 
>>> At cursory glance it also looks to me like this would be a minimal 
>>> refactor from the current state.
>>> 
>>> What is the reason we want to add these other dependencies?
>>> 
>>> 
 On 19 Sep 2024, at 20:31, Štefan Miklošovič >>> > wrote:
 
 
 well the Maven plugin declares that it downloads protoc from Maven 
 Central automatically _somehow_ so coding up an Ant task which does 
 something similar shouldn't be too hard

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-09-19 Thread Benedict Elliott Smith

I just want to flag here that this is a topic I have strong opinions on, but 
the CEP is not really specific or detailed enough to understand precisely how 
it will be implemented. So, if a patch is already being produced, most of my 
feedback is likely to be provided some time after a patch appears, through the 
normal review process. I want to flag this now to avoid any surprise.

I will say that upfront that, ideally, this system should be designed to have 
~zero overhead when disabled, and with minimal coupling (between its own 
components and C* itself), so that entirely orthogonal approaches can be 
integrated in future without polluting the codebase.


> On 19 Sep 2024, at 19:14, Patrick McFadin  wrote:
> 
> The work has begun but we don't have a VOTE thread for this CEP. Can one get 
> started?
> 
> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia  > wrote:
>> Sure, Caleb. I will include the work as part of CASSANDRA-19534 
>>  in the CEP-41.
>> 
>> Jaydeep
>> 
>> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe > > wrote:
>>> FYI, there is some ongoing sort-of-related work going on in CASSANDRA-19534 
>>> 
>>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia 
>>> mailto:chovatia.jayd...@gmail.com>> wrote:
 Just created an official CEP-41 
 
  incorporating the feedback from this discussion. Feel free to let me know 
 if I may have missed some important feedback in this thread that is not 
 captured in the CEP-41.
 
 Jaydeep
 
 On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia 
 mailto:chovatia.jayd...@gmail.com>> wrote:
> Thanks, Josh. I will file an official CEP with all the details in a few 
> days and update this thread with that CEP number.
> Thanks a lot everyone for providing valuable insights!
> 
> Jaydeep
> 
> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie  > wrote:
>>> Do folks think we should file an official CEP and take it there?
>> +1 here.
>> 
>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread 
>> into a draft seems like a solid next step.
>> 
>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>> I see a lot of great ideas being discussed or proposed in the past to 
>>> cover the most common rate limiter candidate use cases. Do folks think 
>>> we should file an official CEP and take it there?
>>> 
>>> Jaydeep
>>> 
>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>>> mailto:calebrackli...@gmail.com>> wrote:
>>> I just remembered the other day that I had done a quick writeup on the 
>>> state of compaction stress-related throttling in the project:
>>> 
>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>> 
>>> I'm sure most of it is old news to the people on this thread, but I 
>>> figured I'd post it just in case :)
>>> 
>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie >> > wrote:
>>> 
 2.) We should make sure the links between the "known" root causes of 
 cascading failures and the mechanisms we introduce to avoid them 
 remain very strong.
>>> Seems to me that our historical strategy was to address individual 
>>> known cases one-by-one rather than looking for a more holistic 
>>> load-balancing and load-shedding solution. While the engineer in me 
>>> likes the elegance of a broad, more-inclusive actual SEDA-like 
>>> approach, the pragmatist in me wonders how far we think we are today 
>>> from a stable set-point.
>>> 
>>> i.e. are we facing a handful of cases where nodes can still get pushed 
>>> over and then cascade that we can surgically address, or are we facing 
>>> a broader lack of back-pressure that rears its head in different 
>>> domains (client -> coordinator, coordinator -> replica, internode with 
>>> other operations, etc) at surprising times and should be considered 
>>> more holistically?
>>> 
>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
 I almost forgot CASSANDRA-15817, which introduced 
 reject_repair_compaction_threshold, which provides a mechanism to stop 
 repairs while compaction is underwater.
 
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
> mailto:calebrackli...@gmail.com>> wrote:
> 
> Hey all,
> 
> I'm a bit late to the discussion. I see that we've already discussed 
> CASSANDRA-15013 
>  and 
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

Great stuff. Keep it going. If we go to replace this then posts like this
are great to gather the feedback.

I like the protobuf idea. If we were on Maven then we could use the
protobuf plugin which compiles the schema as part of the build and it
generates Java classes from it which we might use for actual query
serialization into binary format.

Unfortunately there is nothing like that for Ant, protoc would need to be a
local dependency on the computer which compiles the project to be able to
do that so that is kind of a dead end. Or is there any workaround here?

On Thu, Sep 19, 2024 at 8:14 PM David Capwell  wrote:

> I personally don’t mind switching off Chronicle Queue.  I have a
> transformer function to convert the FQL logs to Thrift (don’t judge) and
> use easy-cas to reply on a cluster… replying FQL from Chronicle Queue was
> far slower than Thrift and was hard to push the cluster as the client was
> the bottleneck… switching off it let me actually cause Cassandra to be the
> bottleneck…
>
> No, there is another perfectly sensible option: just implement a simple
> serialisation format ourselves.
>
>
> My one issue with this is we need to ask who the target audience is?
> Trying to add FQL reply to easy-cas was a pain for 2 reasons: Chronicle
> Queue is slow, custom C* serializers that must be in the class path (that
> brings a ton of baggage with it)…
>
> For me FQL has 2 use cases
>
> 1) analytic, what are people actually doing and what are there frequencies?
> 2) reply
>
> In both cases custom serializers are a pain due to baggage they bring and
> the limiting nature of it… what if I want a Go based FQL reply?  I need
> java code from cassandra-all…
>
> I personally favor serializers like protobuf/thrift as they are portable
> and can be used by users without issues.  As for the log format itself… a
> super simple log format that can be easy to read that is custom is fine by
> me… I am cool with the log being custom as I don’t know a good portable log
> format at the top of my head… a simple thing like the following works for me
>
> Header: lengths, checksum, etc.
> Body: std serializer
> +
>
> On Sep 19, 2024, at 9:14 AM, C. Scott Andreas 
> wrote:
>
> Agree with Benedict's proposal here.
>
> In circumstances when I've needed to capture and work with FQL, I've found
> it cumbersome to work with Chronicle. The dial-home functionality and
> release process changes put it over the top for me.
>
> – Scott
>
> On Sep 19, 2024, at 8:40 AM, Josh McKenzie  wrote:
>
>
> there is another perfectly sensible option
>
> My apologies; I wasn't clear. *If we choose to continue to use chronicle
> queue*, what I enumerated was the only logical option I saw for us.
>
> Altogether I think we should just move away from the library as you've
> laid out here Benedict.
>
> On Thu, Sep 19, 2024, at 11:34 AM, Benedict wrote:
>
>
> No, there is another perfectly sensible option: just implement a simple
> serialisation format ourselves.
>
> I am against forking their code; that is a much higher maintenance burden
> than just writing something simple ourselves. We’ve spent longer
> collectively discussing and maintaining this dependency than it would take
> to implement the features we use.
>
> I still have not heard a compelling reason we adopted it as a dependency
> in the first place.
>
> On 19 Sep 2024, at 16:26, Josh McKenzie  wrote:
>
> 
>
> a jerk move, but they started it with this weird release model
>
> I think that's the only option given their release model and lack of
> backporting bugfixes to the latest ea. Either you run tip of the spear, pay
> them for bugfixes, or run what's effectively an unsupported LTS in the form
> of ea.
>
> So doesn't seem like a jerk move to me as much as it seems like an
> eventuality of their release model.
>
> On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote:
>
> I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas ~
> 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining
> the string operations overhead in the JVM of log concatenation vs slapping
> binary to CQ’s off heap-and-append operation was substantial.
>
> We could hostile fork and bring the bits we use in tree (a jerk move, but
> they started it with this weird release model). I’d rather avoid this, but
> it’s an option seeing as how it’s ASFv2.
>
> On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan 
> wrote:
>
>
> When it comes to alternatives, what about logback + slf4j? It has
> appenders where we want, it is sync / async, we can code some nio appender
> too I guess, it logs it as text into a file so we do not need any special
> tooling to review that. For tailing which Chronicle also offers, I guess
> "tail -f that.log" just does the job? logback even rolls the files after
> they are big enough so it rolls the files the same way after some
> configured period / size as Chronicle does (It even compresses the logs).
>
>
> Yes it was considered.  The whole point was to have a bi

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

I think protobuf upon serialization is just a bunch of bytes anyway. If we
figure out the header as David showed then we can still serialize it all
with the machinery / serializers you mentioned. It can write bytes, right?!
I very briefly checked and I think that protobuf is super simple and does
not have any checksumming etc. so some sauce on top of that would be
necessary anyway and we can reuse what we have to produce binary files.

On the consumer side, the binary file would be parsed with some tooling
e.g. in Go, indeed, but the headers and stuff would be so simple that it
would be just a coding exercise and then it might be deserialized with
protobuf for that language.

Basically, only the payload itself would be the product of protobuf and all
around super simple to crack through.

On Thu, Sep 19, 2024 at 10:41 PM Benedict  wrote:

> Sorry, I missed that. I’m not convinced any of these logs need language
> agnostic tools for access, but if that’s a goal for other folk I don’t feel
> strongly about it.
>
> On 19 Sep 2024, at 21:06, Štefan Miklošovič 
> wrote:
>
> 
> More to it, it is actually not only about FQL. Audit logging is on
> Chronicle queues too so inspecting that would be platform independent as
> well.
>
> CEP-12 suggests that there might be a persistent store for diagnostic
> events as well. If somebody wants to inspect what a node was doing after it
> went offline as for now all these events are in memory only.
>
> This would basically enable people to fully inspect what the cluster was
> doing from FQL to Audit to Diagnostics in a language independent manner.
>
> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič 
> wrote:
>
>> I think the biggest selling point for using something like protobuf is
>> what David said - what if he wants to replay it in Go? Basing it on
>> something language neutral enables people to replay it in whatever they
>> want. If we have something totally custom then it is replayable just in
>> Java without bringing tons of dependencies to their projects. That is the
>> message I got from what he wrote.
>>
>> On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:
>>
>>> Do we need any of these things either? We have our own serialisation
>>> framework and file readers and writers, and at least in the FQL case these
>>> are the native serialisation format.
>>>
>>> At cursory glance it also looks to me like this would be a minimal
>>> refactor from the current state.
>>>
>>> What is the reason we want to add these other dependencies?
>>>
>>>
>>> On 19 Sep 2024, at 20:31, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> well the Maven plugin declares that it downloads protoc from Maven
>>> Central automatically _somehow_ so coding up an Ant task which does
>>> something similar shouldn't be too hard. I will investigate this idea.
>>>
>>> On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams 
>>> wrote:
>>>
 On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
  wrote:
 > Unfortunately there is nothing like that for Ant, protoc would need
 to be a local dependency on the computer which compiles the project to be
 able to do that so that is kind of a dead end. Or is there any workaround
 here?

 In the old thrift days I believe we generated the code and checked it
 in so you didn't need to compile locally.

 Kind Regards,
 Brandon

>>>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

I think the biggest selling point for using something like protobuf is what
David said - what if he wants to replay it in Go? Basing it on something
language neutral enables people to replay it in whatever they want. If we
have something totally custom then it is replayable just in Java without
bringing tons of dependencies to their projects. That is the message I got
from what he wrote.

On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:

> Do we need any of these things either? We have our own serialisation
> framework and file readers and writers, and at least in the FQL case these
> are the native serialisation format.
>
> At cursory glance it also looks to me like this would be a minimal
> refactor from the current state.
>
> What is the reason we want to add these other dependencies?
>
>
> On 19 Sep 2024, at 20:31, Štefan Miklošovič 
> wrote:
>
> 
> well the Maven plugin declares that it downloads protoc from Maven Central
> automatically _somehow_ so coding up an Ant task which does something
> similar shouldn't be too hard. I will investigate this idea.
>
> On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:
>
>> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
>>  wrote:
>> > Unfortunately there is nothing like that for Ant, protoc would need to
>> be a local dependency on the computer which compiles the project to be able
>> to do that so that is kind of a dead end. Or is there any workaround here?
>>
>> In the old thrift days I believe we generated the code and checked it
>> in so you didn't need to compile locally.
>>
>> Kind Regards,
>> Brandon
>>
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

Do we need any of these things either? We have our own serialisation framework and file readers and writers, and at least in the FQL case these are the native serialisation format. At cursory glance it also looks to me like this would be a minimal refactor from the current state.What is the reason we want to add these other dependencies?On 19 Sep 2024, at 20:31, Štefan Miklošovič  wrote:well the Maven plugin declares that it downloads protoc from Maven Central automatically _somehow_ so coding up an Ant task which does something similar shouldn't be too hard. I will investigate this idea. On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
 wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a local dependency on the computer which compiles the project to be able to do that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread J. D. Jordan

I think reserializing the payload into a new format is counter productive to some of the performance goals of the binary logs?If you have to deserialize and reserialize the message you are going to be throwing off a ton of extra GC.I think we have done a lot of work in recent version to reduce the amount of re-serialization that happens in the query paths?  Not sure we want to add some back in on purpose?  Keeping the payload in the internal serialization format does indeed have the drawbacks David mentioned, but I think “zero serialization overhead” is a pretty big advantage to keeping things that way?-JeremiahOn Sep 19, 2024, at 3:56 PM, Štefan Miklošovič  wrote:I think protobuf upon serialization is just a bunch of bytes anyway. If we figure out the header as David showed then we can still serialize it all with the machinery / serializers you mentioned. It can write bytes, right?! I very briefly checked and I think that protobuf is super simple and does not have any checksumming etc. so some sauce on top of that would be necessary anyway and we can reuse what we have to produce binary files.On the consumer side, the binary file would be parsed with some tooling e.g. in Go, indeed, but the headers and stuff would be so simple that it would be just a coding exercise and then it might be deserialized with protobuf for that language.Basically, only the payload itself would be the product of protobuf and all around super simple to crack through.On Thu, Sep 19, 2024 at 10:41 PM Benedict  wrote:Sorry, I missed that. I’m not convinced any of these logs need language agnostic tools for access, but if that’s a goal for other folk I don’t feel strongly about it.On 19 Sep 2024, at 21:06, Štefan Miklošovič  wrote:More to it, it is actually not only about FQL. Audit logging is on Chronicle queues too so inspecting that would be platform independent as well. CEP-12 suggests that there might be a persistent store for diagnostic events as well. If somebody wants to inspect what a node was doing after it went offline as for now all these events are in memory only.This would basically enable people to fully inspect what the cluster was doing from FQL to Audit to Diagnostics in a language independent manner. On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič  wrote:I think the biggest selling point for using something like protobuf is what David said - what if he wants to replay it in Go? Basing it on something language neutral enables people to replay it in whatever they want. If we have something totally custom then it is replayable just in Java without bringing tons of dependencies to their projects. That is the message I got from what he wrote. On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:Do we need any of these things either? We have our own serialisation framework and file readers and writers, and at least in the FQL case these are the native serialisation format. At cursory glance it also looks to me like this would be a minimal refactor from the current state.What is the reason we want to add these other dependencies?On 19 Sep 2024, at 20:31, Štefan Miklošovič  wrote:well the Maven plugin declares that it downloads protoc from Maven Central automatically _somehow_ so coding up an Ant task which does something similar shouldn't be too hard. I will investigate this idea. On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
 wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a local dependency on the computer which compiles the project to be able to do that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

I agree, even if we don’t manage the optimal zero conversion. I am also not entirely convinced we need to worry about compatibility for FQL and other logs - we can just say you must use the version of C* tools you produced the log with - I would even be fine with saying this isn’t even guaranteed to be compatible across minor versions (at least for FQL, perhaps not audit logging. But we can and should be explicit about our guarantees each file we produce).Most of this is not meant to be baked into production workflows, and if they are they can use command line tools bundled with C*. Programmatic consumers are primarily going to be us here on this list. Let’s not burden ourselves unnecessarily.On 19 Sep 2024, at 22:17, J. D. Jordan  wrote:I think reserializing the payload into a new format is counter productive to some of the performance goals of the binary logs?If you have to deserialize and reserialize the message you are going to be throwing off a ton of extra GC.I think we have done a lot of work in recent version to reduce the amount of re-serialization that happens in the query paths?  Not sure we want to add some back in on purpose?  Keeping the payload in the internal serialization format does indeed have the drawbacks David mentioned, but I think “zero serialization overhead” is a pretty big advantage to keeping things that way?-JeremiahOn Sep 19, 2024, at 3:56 PM, Štefan Miklošovič  wrote:I think protobuf upon serialization is just a bunch of bytes anyway. If we figure out the header as David showed then we can still serialize it all with the machinery / serializers you mentioned. It can write bytes, right?! I very briefly checked and I think that protobuf is super simple and does not have any checksumming etc. so some sauce on top of that would be necessary anyway and we can reuse what we have to produce binary files.On the consumer side, the binary file would be parsed with some tooling e.g. in Go, indeed, but the headers and stuff would be so simple that it would be just a coding exercise and then it might be deserialized with protobuf for that language.Basically, only the payload itself would be the product of protobuf and all around super simple to crack through.On Thu, Sep 19, 2024 at 10:41 PM Benedict  wrote:Sorry, I missed that. I’m not convinced any of these logs need language agnostic tools for access, but if that’s a goal for other folk I don’t feel strongly about it.On 19 Sep 2024, at 21:06, Štefan Miklošovič  wrote:More to it, it is actually not only about FQL. Audit logging is on Chronicle queues too so inspecting that would be platform independent as well. CEP-12 suggests that there might be a persistent store for diagnostic events as well. If somebody wants to inspect what a node was doing after it went offline as for now all these events are in memory only.This would basically enable people to fully inspect what the cluster was doing from FQL to Audit to Diagnostics in a language independent manner. On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič  wrote:I think the biggest selling point for using something like protobuf is what David said - what if he wants to replay it in Go? Basing it on something language neutral enables people to replay it in whatever they want. If we have something totally custom then it is replayable just in Java without bringing tons of dependencies to their projects. That is the message I got from what he wrote. On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:Do we need any of these things either? We have our own serialisation framework and file readers and writers, and at least in the FQL case these are the native serialisation format. At cursory glance it also looks to me like this would be a minimal refactor from the current state.What is the reason we want to add these other dependencies?On 19 Sep 2024, at 20:31, Štefan Miklošovič  wrote:well the Maven plugin declares that it downloads protoc from Maven Central automatically _somehow_ so coding up an Ant task which does something similar shouldn't be too hard. I will investigate this idea. On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
 wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a local dependency on the computer which compiles the project to be able to do that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

2024-09-19 Thread guo Maxwell

No，I think still need some discuss on grammar detail after I finish the
first version

Patrick McFadin 于2024年9月20日 周五上午2:24写道：

> Is this CEP ready for a VOTE thread?
>
> On Sat, Aug 24, 2024 at 8:56 PM guo Maxwell  wrote:
>
>> Thank you for your replies, I will prepare a CEP later.
>>
>> Patrick McFadin  于2024年8月20日周二 02:11写道：
>>
>>> +1 This is a CEP
>>>
>>> On Mon, Aug 19, 2024 at 10:50 AM Jon Haddad  wrote:
>>>
 Given the fairly large surface area for this, i think it should be a
 CEP.

 —
 Jon Haddad
 Rustyrazorblade Consulting
 rustyrazorblade.com


 On Mon, Aug 19, 2024 at 10:44 AM Bernardo Botella <
 conta...@bernardobotella.com> wrote:

> Definitely a nice addition to CQL.
>
> Looking for inspiration at how Postgres and Mysql do that may also
> help with the final design (I like the WITH proposed by Stefan, but I 
> would
> definitely take a look at the INCLUDING keyword proposed by Postgres).
> https://www.postgresql.org/docs/current/sql-createtable.html
> https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html
>
> On top of that, and as part of the interesting questions, I would like
> to add the permissions to the mix. Both the question about copying them
> over (with a WITH keyword probably), and the need for read permissions on
> the source table as well.
>
> Bernardo
>
> On Aug 19, 2024, at 10:01 AM, Štefan Miklošovič <
> smikloso...@apache.org> wrote:
>
> BTW this would be cool to do as well:
>
> ALTER TABLE ks.to_copy LIKE ks.tb WITH INDICES;
>
> This would mean that if we create a copy of a table, later we can
> decide that we need indices too, so we might "enrich" that table with
> indices from the old one without necessarily explicitly re-creating them 
> on
> that new table.
>
> On Mon, Aug 19, 2024 at 6:55 PM Štefan Miklošovič <
> smikloso...@apache.org> wrote:
>
>> I think this is an interesting idea worth exploring. I definitely
>> agree with Benjamin who raised important questions which needs to be
>> answered first. Also, what about triggers?
>>
>> It might be rather "easy" to come up with something simple but it
>> should be a comprehensive solution with predictable behavior we all agree
>> on.
>>
>> If a keyspace of a new table does not exist we would need to create
>> that one too before. For the simplicity, I would just make it a must to
>> create it on same keyspace. We might iterate on that in the future.
>>
>> UDTs are created per keyspace so there is nothing to re-create. We
>> just need to reference it from a new table, right?
>>
>> Indexes and MVs are interesting but in theory they might be
>> re-created too.
>>
>> Would it be appropriate to use something like this?
>>
>> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND VIEWS AND
>> TRIGGERS 
>>
>> Without "WITH" it would just copy a table with nothing else.
>>
>> On Mon, Aug 19, 2024 at 6:10 PM guo Maxwell 
>> wrote:
>>
>>> Hello， everyone：
>>> As  Jira CASSANDRA-7662
>>>  has
>>> described , we would like to introduce a new grammer " CREATE TABLE 
>>> LIKE "
>>> ,which  simplifies creating new tables duplicating the existing ones .
>>> The format may be like : CREATE TABLE  LIKE 
>>>
>>> Before I implement this function, do you have any suggestions on
>>> this?
>>>
>>> Looking forward to your reply！
>>>
>>
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread Brandon Williams

On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
 wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a 
> local dependency on the computer which compiles the project to be able to do 
> that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

well the Maven plugin declares that it downloads protoc from Maven Central
automatically _somehow_ so coding up an Ant task which does something
similar shouldn't be too hard. I will investigate this idea.

On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:

> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
>  wrote:
> > Unfortunately there is nothing like that for Ant, protoc would need to
> be a local dependency on the computer which compiles the project to be able
> to do that so that is kind of a dead end. Or is there any workaround here?
>
> In the old thrift days I believe we generated the code and checked it
> in so you didn't need to compile locally.
>
> Kind Regards,
> Brandon
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

> Do we need any of these things either? We have our own serialisation 
> framework and file readers and writers, and at least in the FQL case these 
> are the native serialisation format. 
> 
> At cursory glance it also looks to me like this would be a minimal refactor 
> from the current state.
> 
> What is the reason we want to add these other dependencies?

It’s all about the target user of the feature.  I can’t speak for audit logging 
(why do we need more than slf4j?  No clue), but one of the users for Chronicle 
Queue is FQL.  We do have a FQL reply command line, but we are not trying to 
make this a powerful tool with a ton of ways to replay with different rates, 
interleaving, etc… its a basic “I run w/e I see in the logs”.  We have kinda 
moved away from stress tools being in-tree and letting this evolve outside of 
our code base, so looking at this direction it makes sense for FQL reply to be 
external to the Apache Cassandra tree… in fact leveraging existing tools made 
FQL faster and far more powerful than the in-tree version…

With all that, my view of who the target user of FQL data is the 
users/developers and not internal to Apache Cassandra… so I then need to ask 
what is the experience using it…

If we use internal serializers (that are strongly coupled with our internal 
networking) then the user needs to depend on cassandra-all… this brings in 74 
dependencies (see 
https://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all/5.0.0) 
and non of these matter to the user, so you must exclude every single one or 
you just accept w/e we bring in (which means you are stuck with Java Driver 3, 
can’t use Java Driver 4).

Now that you got that out of the way, you can add the log reading into your 
tools and do what you want, as long as they are in Java.

So, using our “serialization framework” only seems to come with burdens to me

1) its versioning is our internal message versioning, so if we make changes to 
our networking FQL is forced to bump its version as well.  If we need to change 
the log format we need to also bump our networking version…. 
2) for the project we add even more public classes to the list we need to 
maintain compatibility with (I have no clue what is public right now, we debate 
this w/e it comes up), so refactoring our CQL processing layer gets harder
3) cassandra-all is massive
4) in order to reuse outside of java we need to implement translations to a 
more common format so other languages can use… I do have tools in python to 
read the Thrift FQL log I write and compute stats on user behavior… it would be 
nice to leverage the log file directly and not have to translate it

> On Sep 19, 2024, at 1:04 PM, Štefan Miklošovič  wrote:
> 
> More to it, it is actually not only about FQL. Audit logging is on Chronicle 
> queues too so inspecting that would be platform independent as well. 
> 
> CEP-12 suggests that there might be a persistent store for diagnostic events 
> as well. If somebody wants to inspect what a node was doing after it went 
> offline as for now all these events are in memory only.
> 
> This would basically enable people to fully inspect what the cluster was 
> doing from FQL to Audit to Diagnostics in a language independent manner. 
> 
> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič  > wrote:
>> I think the biggest selling point for using something like protobuf is what 
>> David said - what if he wants to replay it in Go? Basing it on something 
>> language neutral enables people to replay it in whatever they want. If we 
>> have something totally custom then it is replayable just in Java without 
>> bringing tons of dependencies to their projects. That is the message I got 
>> from what he wrote. 
>> 
>> On Thu, Sep 19, 2024 at 9:47 PM Benedict > > wrote:
>>> Do we need any of these things either? We have our own serialisation 
>>> framework and file readers and writers, and at least in the FQL case these 
>>> are the native serialisation format. 
>>> 
>>> At cursory glance it also looks to me like this would be a minimal refactor 
>>> from the current state.
>>> 
>>> What is the reason we want to add these other dependencies?
>>> 
>>> 
 On 19 Sep 2024, at 20:31, Štefan Miklošovič >>> > wrote:

 well the Maven plugin declares that it downloads protoc from Maven Central 
 automatically _somehow_ so coding up an Ant task which does something 
 similar shouldn't be too hard. I will investigate this idea. 

 On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams >>> > wrote:
> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
> mailto:smikloso...@apache.org>> wrote:
> > Unfortunately there is nothing like that for Ant, protoc would need to 
> > be a local dependency on the computer which compiles the project to be 
> > able to do that so that is kind of a dead end. Or is ther

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

Sorry, I missed that. I’m not convinced any of these logs need language agnostic tools for access, but if that’s a goal for other folk I don’t feel strongly about it.On 19 Sep 2024, at 21:06, Štefan Miklošovič  wrote:More to it, it is actually not only about FQL. Audit logging is on Chronicle queues too so inspecting that would be platform independent as well. CEP-12 suggests that there might be a persistent store for diagnostic events as well. If somebody wants to inspect what a node was doing after it went offline as for now all these events are in memory only.This would basically enable people to fully inspect what the cluster was doing from FQL to Audit to Diagnostics in a language independent manner. On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič  wrote:I think the biggest selling point for using something like protobuf is what David said - what if he wants to replay it in Go? Basing it on something language neutral enables people to replay it in whatever they want. If we have something totally custom then it is replayable just in Java without bringing tons of dependencies to their projects. That is the message I got from what he wrote. On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:Do we need any of these things either? We have our own serialisation framework and file readers and writers, and at least in the FQL case these are the native serialisation format. At cursory glance it also looks to me like this would be a minimal refactor from the current state.What is the reason we want to add these other dependencies?On 19 Sep 2024, at 20:31, Štefan Miklošovič  wrote:well the Maven plugin declares that it downloads protoc from Maven Central automatically _somehow_ so coding up an Ant task which does something similar shouldn't be too hard. I will investigate this idea. On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams  wrote:On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
 wrote:
> Unfortunately there is nothing like that for Ant, protoc would need to be a local dependency on the computer which compiles the project to be able to do that so that is kind of a dead end. Or is there any workaround here?

In the old thrift days I believe we generated the code and checked it
in so you didn't need to compile locally.

Kind Regards,
Brandon

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

More to it, it is actually not only about FQL. Audit logging is on
Chronicle queues too so inspecting that would be platform independent as
well.

CEP-12 suggests that there might be a persistent store for diagnostic
events as well. If somebody wants to inspect what a node was doing after it
went offline as for now all these events are in memory only.

This would basically enable people to fully inspect what the cluster was
doing from FQL to Audit to Diagnostics in a language independent manner.

On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič 
wrote:

> I think the biggest selling point for using something like protobuf is
> what David said - what if he wants to replay it in Go? Basing it on
> something language neutral enables people to replay it in whatever they
> want. If we have something totally custom then it is replayable just in
> Java without bringing tons of dependencies to their projects. That is the
> message I got from what he wrote.
>
> On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:
>
>> Do we need any of these things either? We have our own serialisation
>> framework and file readers and writers, and at least in the FQL case these
>> are the native serialisation format.
>>
>> At cursory glance it also looks to me like this would be a minimal
>> refactor from the current state.
>>
>> What is the reason we want to add these other dependencies?
>>
>>
>> On 19 Sep 2024, at 20:31, Štefan Miklošovič 
>> wrote:
>>
>> 
>> well the Maven plugin declares that it downloads protoc from Maven
>> Central automatically _somehow_ so coding up an Ant task which does
>> something similar shouldn't be too hard. I will investigate this idea.
>>
>> On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams 
>> wrote:
>>
>>> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
>>>  wrote:
>>> > Unfortunately there is nothing like that for Ant, protoc would need to
>>> be a local dependency on the computer which compiles the project to be able
>>> to do that so that is kind of a dead end. Or is there any workaround here?
>>>
>>> In the old thrift days I believe we generated the code and checked it
>>> in so you didn't need to compile locally.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

I am not sure what you mean. I mean, I do, but not following. Look into
FullQueryLogger (1) what it goes to put into CQL is a query like String
wrapped in a Query object. It literally take a String as a representation
of a query a user executed. We just replace this by serializing that query
to protobuf. What is counter productive? We just replace one thing for
another. Audit message / events would be similar.

(1)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321

On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan 
wrote:

> I think reserializing the payload into a new format is counter productive
> to some of the performance goals of the binary logs?
> If you have to deserialize and reserialize the message you are going to be
> throwing off a ton of extra GC.
> I think we have done a lot of work in recent version to reduce the amount
> of re-serialization that happens in the query paths?  Not sure we want to
> add some back in on purpose?  Keeping the payload in the internal
> serialization format does indeed have the drawbacks David mentioned, but I
> think “zero serialization overhead” is a pretty big advantage to keeping
> things that way?
>
> -Jeremiah
>
> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič 
> wrote:
>
> 
> I think protobuf upon serialization is just a bunch of bytes anyway. If we
> figure out the header as David showed then we can still serialize it all
> with the machinery / serializers you mentioned. It can write bytes, right?!
> I very briefly checked and I think that protobuf is super simple and does
> not have any checksumming etc. so some sauce on top of that would be
> necessary anyway and we can reuse what we have to produce binary files.
>
> On the consumer side, the binary file would be parsed with some tooling
> e.g. in Go, indeed, but the headers and stuff would be so simple that it
> would be just a coding exercise and then it might be deserialized with
> protobuf for that language.
>
> Basically, only the payload itself would be the product of protobuf and
> all around super simple to crack through.
>
> On Thu, Sep 19, 2024 at 10:41 PM Benedict  wrote:
>
>> Sorry, I missed that. I’m not convinced any of these logs need language
>> agnostic tools for access, but if that’s a goal for other folk I don’t feel
>> strongly about it.
>>
>> On 19 Sep 2024, at 21:06, Štefan Miklošovič 
>> wrote:
>>
>> 
>> More to it, it is actually not only about FQL. Audit logging is on
>> Chronicle queues too so inspecting that would be platform independent as
>> well.
>>
>> CEP-12 suggests that there might be a persistent store for diagnostic
>> events as well. If somebody wants to inspect what a node was doing after it
>> went offline as for now all these events are in memory only.
>>
>> This would basically enable people to fully inspect what the cluster was
>> doing from FQL to Audit to Diagnostics in a language independent manner.
>>
>> On Thu, Sep 19, 2024 at 9:50 PM Štefan Miklošovič 
>> wrote:
>>
>>> I think the biggest selling point for using something like protobuf is
>>> what David said - what if he wants to replay it in Go? Basing it on
>>> something language neutral enables people to replay it in whatever they
>>> want. If we have something totally custom then it is replayable just in
>>> Java without bringing tons of dependencies to their projects. That is the
>>> message I got from what he wrote.
>>>
>>> On Thu, Sep 19, 2024 at 9:47 PM Benedict  wrote:
>>>
 Do we need any of these things either? We have our own serialisation
 framework and file readers and writers, and at least in the FQL case these
 are the native serialisation format.

 At cursory glance it also looks to me like this would be a minimal
 refactor from the current state.

 What is the reason we want to add these other dependencies?


 On 19 Sep 2024, at 20:31, Štefan Miklošovič 
 wrote:

 
 well the Maven plugin declares that it downloads protoc from Maven
 Central automatically _somehow_ so coding up an Ant task which does
 something similar shouldn't be too hard. I will investigate this idea.

 On Thu, Sep 19, 2024 at 9:26 PM Brandon Williams 
 wrote:

> On Thu, Sep 19, 2024 at 2:16 PM Štefan Miklošovič
>  wrote:
> > Unfortunately there is nothing like that for Ant, protoc would need
> to be a local dependency on the computer which compiles the project to be
> able to do that so that is kind of a dead end. Or is there any workaround
> here?
>
> In the old thrift days I believe we generated the code and checked it
> in so you didn't need to compile locally.
>
> Kind Regards,
> Brandon
>

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

> I think reserializing the payload into a new format is counter productive to 
> some of the performance goals of the binary logs? If you have to deserialize 
> and reserialize the message you are going to be throwing off a ton of extra 
> GC.

That isn’t what happens in FQL =D.

FQL creates a custom payload using Chronicle fields, then serializes 
QueryOptions (we have in-memory objects we use for the query).

We are not taking the client network bytes and saving to a log (client bytes 
could be different pages… this would be annoying to support), we are working 
with the following

String query
ByteBuffer[] binds
QueryOptions options

“Could” we use our networking serializer?  Sure, but then what do we get?  The 
cost to construct the object to pass to the serializer is basically the same, 
so it’s just the time it takes to serialize it, and I argue in this very 
specific case the costs are not really that noticeable (and have 
benchmarked...).

So, we put a burden on users (and us to maintain binary compatibility with 
QueryOptions), making it harder for them at the cost of a few nanoseconds more 
to serialize?

> On Sep 19, 2024, at 3:32 PM, Štefan Miklošovič  wrote:
> 
> Wow this is quite a rabbit hole. 
> 
> What is ultimately going to be written into Chronicle Queue is what 
> writeMarshallablePayload method on AbstractLogQuery puts into that WireOut. 
> If we take e.g. QUERY_OPTIONS into consideration, then it writes it into 
> queryOptionsBuffer which is populated in AbstractLogEntry's constructor 
> (QueryOptions.codec.encode).
> 
> That takes QueryOptions, deserialized stuff we got from QueryMessage via 
> codec in its decode method, and it encodes it back to ByteBuf. So for now, we 
> just serialize what we deserialized all over again. 
> 
> But for what reason do we need to serialize it again upon logging it to FQL? 
> "body" here which is used for decoding bytes to QueryOptions is ByteBuf 
> already. So if we go to write bytes then well here we have it. It does not 
> seem to be necessary to decode / encode, just use these bytes as they are?
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L51
> 
> On Thu, Sep 19, 2024 at 11:50 PM Benedict Elliott Smith  > wrote:
>> Well, that looks like item number one to fix when we change the 
>> serialisation format. We should clearly not duplicate query strings we have 
>> recently logged.
>> 
>> We do however appear to also serialise the bind variables, which benefit 
>> from being in the format we already have available in memory.
>> 
>>> On 19 Sep 2024, at 22:26, Štefan Miklošovič >> > wrote:
>>> 
>>> I am not sure what you mean. I mean, I do, but not following. Look into 
>>> FullQueryLogger (1) what it goes to put into CQL is a query like String 
>>> wrapped in a Query object. It literally take a String as a representation 
>>> of a query a user executed. We just replace this by serializing that query 
>>> to protobuf. What is counter productive? We just replace one thing for 
>>> another. Audit message / events would be similar. 
>>> 
>>> (1) 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/fql/FullQueryLogger.java#L320-L321
>>> 
>>> On Thu, Sep 19, 2024 at 11:17 PM J. D. Jordan >> > wrote:
 I think reserializing the payload into a new format is counter productive 
 to some of the performance goals of the binary logs?
 If you have to deserialize and reserialize the message you are going to be 
 throwing off a ton of extra GC.
 I think we have done a lot of work in recent version to reduce the amount 
 of re-serialization that happens in the query paths?  Not sure we want to 
 add some back in on purpose?  Keeping the payload in the internal 
 serialization format does indeed have the drawbacks David mentioned, but I 
 think “zero serialization overhead” is a pretty big advantage to keeping 
 things that way?
 
 -Jeremiah
 
> On Sep 19, 2024, at 3:56 PM, Štefan Miklošovič  > wrote:
> 
> 
> I think protobuf upon serialization is just a bunch of bytes anyway. If 
> we figure out the header as David showed then we can still serialize it 
> all with the machinery / serializers you mentioned. It can write bytes, 
> right?! I very briefly checked and I think that protobuf is super simple 
> and does not have any checksumming etc. so some sauce on top of that 
> would be necessary anyway and we can reuse what we have to produce binary 
> files.
> 
> On the consumer side, the binary file would be parsed with some tooling 
> e.g. in Go, indeed, but the headers and stuff would be so simple that it 
> would be just a coding exercise and then it might be deserialized with 
> protobuf for that language.
> 
> Basical

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread C. Scott Andreas

Agree with Benedict's proposal here. In circumstances when I've needed to capture and work with FQL, I've
found it cumbersome to work with Chronicle. The dial-home functionality and release process changes put it
over the top for me. – Scott On Sep 19, 2024, at 8:40 AM, Josh McKenzie wrote:
there is another perfectly sensible option My apologies; I wasn't clear. If we choose to continue to use
chronicle queue , what I enumerated was the only logical option I saw for us. Altogether I think we should
just move away from the library as you've laid out here Benedict. On Thu, Sep 19, 2024, at 11:34 AM,
Benedict wrote: No, there is another perfectly sensible option: just implement a simple serialisation
format ourselves. I am against forking their code; that is a much higher maintenance burden than just
writing something simple ourselves. We’ve spent longer collectively discussing and maintaining this
dependency than it would take to implement the features we use. I still have not heard a compelling reason
we adopted it as a dependency in the first place. On 19 Sep 2024, at 16:26, Josh McKenzie
wrote: a jerk move, but they started it with this weird release model I think
that's the only option given their release model and lack of backporting bugfixes to the latest ea. Either
you run tip of the spear, pay them for bugfixes, or run what's effectively an unsupported LTS in the form
of ea. So doesn't seem like a jerk move to me as much as it seems like an eventuality of their release
model. On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote: I feel like a group of us discussed this IRL a
bit at ApacheCon in Vegas ~ 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining the
string operations overhead in the JVM of log concatenation vs slapping binary to CQ’s off heap-and-append
operation was substantial. We could hostile fork and bring the bits we use in tree (a jerk move, but they
started it with this weird release model). I’d rather avoid this, but it’s an option seeing as how it’s
ASFv2. On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan < jeremiah.jor...@gmail.com > wrote: When it
comes to alternatives, what about logback + slf4j? It has appenders where we want, it is sync / async, we
can code some nio appender too I guess, it logs it as text into a file so we do not need any special
tooling to review that. For tailing which Chronicle also offers, I guess "tail -f that.log" just
does the job? logback even rolls the files after they are big enough so it rolls the files the same way
after some configured period / size as Chronicle does (It even compresses the logs). Yes it was considered.
The whole point was to have a binary log because serialization to/from (remember replay is part off this)
text explodes the size on disk and in memory as well as the processing time required and does not meet the
timing requirements of fqltool. -Jeremiah

Re: [DISCUSS] CEP-31 negotiated authentication

Hi Jacek,

I was doing some housekeeping on CEPs and noticed this stalled. Is this
still a CEP you are advocating for?

Anyone else that knows the status, feel free to add in.

Patrick

On Wed, May 31, 2023 at 8:26 AM Derek Chen-Becker 
wrote:

> Hi Jacek,
>
> I took a quick look through the CEP and I think I understand the
> implementation you're donating. I don't think that the approach you're
> taking and the approach I proposed are contradictory, but I want to make
> sure I'm understanding some aspects of the CEP:
>
> 1. Is there any mechanism for discovery so that the client knows which
> authenticators are supported? The main use case I see here is that since
> the client drives selection of the authenticator, the client probably wants
> to utilize the strongest mutually supported mechanism
> 2. Can you specify the client/server exchange in a state diagram or more
> clearly detail which messages are involved? The CEP states that "The driver
> sends an additional preamble along with the initial SASL authentication
> message". Is the "initial SASL auth message" the AUTH_RESPONSE? Are you
> basically saying that the server sends the AUTHENTICATE message with a
> class name, so does the client basically respond with "No, here's the
> authenticator I want to use" in the preamble?
> 3. Does the donated code for the server already handle hot reconfiguration
> of authenticators? The CEP states "We want to make it possible to add, ..."
> so I wasn't sure if that was future work or not
>
> I think I need to re-read and digest, but on first run-through this looks
> really interesting!
>
> Cheers,
>
> Derek
>
> On Fri, May 26, 2023 at 8:09 AM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> Hi,
>>
>> I'd like to start a discussion on negotiated authentication and
>> improvements to authentication, authorization, and role management in
>> general. A draft of proposed changes is included in CEP-31.
>>
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-31+%28DRAFT%29+Negotiated+authentication+and+authorization
>>
>> thanks,
>> - - -- --- -  -
>> Jacek Lewandowski
>>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>

Re: [DISCUSS] CEP-31 negotiated authentication

2024-09-19 Thread Dinesh Joshi

This is an area of interest for me personally and is an important feature.
Not sure if the original author is going to see it through since we've not
had any discussion on it for a while.

Is anybody interested in picking this up?

Dinesh

On Thu, Sep 19, 2024 at 10:54 AM Patrick McFadin  wrote:

> Hi Jacek,
>
> I was doing some housekeeping on CEPs and noticed this stalled. Is this
> still a CEP you are advocating for?
>
> Anyone else that knows the status, feel free to add in.
>
> Patrick
>
> On Wed, May 31, 2023 at 8:26 AM Derek Chen-Becker 
> wrote:
>
>> Hi Jacek,
>>
>> I took a quick look through the CEP and I think I understand the
>> implementation you're donating. I don't think that the approach you're
>> taking and the approach I proposed are contradictory, but I want to make
>> sure I'm understanding some aspects of the CEP:
>>
>> 1. Is there any mechanism for discovery so that the client knows which
>> authenticators are supported? The main use case I see here is that since
>> the client drives selection of the authenticator, the client probably wants
>> to utilize the strongest mutually supported mechanism
>> 2. Can you specify the client/server exchange in a state diagram or more
>> clearly detail which messages are involved? The CEP states that "The driver
>> sends an additional preamble along with the initial SASL authentication
>> message". Is the "initial SASL auth message" the AUTH_RESPONSE? Are you
>> basically saying that the server sends the AUTHENTICATE message with a
>> class name, so does the client basically respond with "No, here's the
>> authenticator I want to use" in the preamble?
>> 3. Does the donated code for the server already handle hot
>> reconfiguration of authenticators? The CEP states "We want to make it
>> possible to add, ..." so I wasn't sure if that was future work or not
>>
>> I think I need to re-read and digest, but on first run-through this looks
>> really interesting!
>>
>> Cheers,
>>
>> Derek
>>
>> On Fri, May 26, 2023 at 8:09 AM Jacek Lewandowski <
>> lewandowski.ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'd like to start a discussion on negotiated authentication and
>>> improvements to authentication, authorization, and role management in
>>> general. A draft of proposed changes is included in CEP-31.
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-31+%28DRAFT%29+Negotiated+authentication+and+authorization
>>>
>>> thanks,
>>> - - -- --- -  -
>>> Jacek Lewandowski
>>>
>>
>>
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker and   |
>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>> +---+
>>
>>

CEP-32: Open-Telemetry integration

Here's another stalled CEP. In this case, no discuss thread or Jira.

Yuki (or anyone else) know the status of this CEP?

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-32%3A+%28DRAFT%29+OpenTelemetry+integration

Patrick

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread Josh McKenzie

> a jerk move, but they started it with this weird release model
I think that's the only option given their release model and lack of 
backporting bugfixes to the latest ea. Either you run tip of the spear, pay 
them for bugfixes, or run what's effectively an unsupported LTS in the form of 
ea.

So doesn't seem like a jerk move to me as much as it seems like an eventuality 
of their release model.

On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote:
> I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas ~ 
> 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining the 
> string operations overhead in the JVM of log concatenation vs slapping binary 
> to CQ’s off heap-and-append operation was substantial. 
> 
> We could hostile fork and bring the bits we use in tree (a jerk move, but 
> they started it with this weird release model). I’d rather avoid this, but 
> it’s an option seeing as how it’s ASFv2. 
> 
> On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan  
> wrote:
>> 
>>> When it comes to alternatives, what about logback + slf4j? It has appenders 
>>> where we want, it is sync / async, we can code some nio appender too I 
>>> guess, it logs it as text into a file so we do not need any special tooling 
>>> to review that. For tailing which Chronicle also offers, I guess "tail -f 
>>> that.log" just does the job? logback even rolls the files after they are 
>>> big enough so it rolls the files the same way after some configured period 
>>> / size as Chronicle does (It even compresses the logs).
>> 
>> Yes it was considered.  The whole point was to have a binary log because 
>> serialization to/from (remember replay is part off this) text explodes the 
>> size on disk and in memory as well as the processing time required and does 
>> not meet the timing requirements of fqltool.
>> 
>> -Jeremiah

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

No, there is another perfectly sensible option: just implement a simple serialisation format ourselves.I am against forking their code; that is a much higher maintenance burden than just writing something simple ourselves. We’ve spent longer collectively discussing and maintaining this dependency than it would take to implement the features we use.I still have not heard a compelling reason we adopted it as a dependency in the first place.On 19 Sep 2024, at 16:26, Josh McKenzie  wrote:a jerk move, but they started it with this weird release modelI think that's the only option given their release model and lack of backporting bugfixes to the latest ea. Either you run tip of the spear, pay them for bugfixes, or run what's effectively an unsupported LTS in the form of ea.So doesn't seem like a jerk move to me as much as it seems like an eventuality of their release model.On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote:I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas ~ 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining the string operations overhead in the JVM of log concatenation vs slapping binary to CQ’s off heap-and-append operation was substantial. We could hostile fork and bring the bits we use in tree (a jerk move, but they started it with this weird release model). I’d rather avoid this, but it’s an option seeing as how it’s ASFv2. On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan  wrote:When it comes to alternatives, what about logback + slf4j? It has appenders where we want, it is sync / async, we can code some nio appender too I guess, it logs it as text into a file so we do not need any special tooling to review that. For tailing which Chronicle also offers, I guess "tail -f that.log" just does the job? logback even rolls the files after they are big enough so it rolls the files the same way after some configured period / size as Chronicle does (It even compresses the logs).Yes it was considered.  The whole point was to have a binary log because serialization to/from (remember replay is part off this) text explodes the size on disk and in memory as well as the processing time required and does not meet the timing requirements of fqltool.-Jeremiah

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-19 Thread Josh McKenzie

> there is another perfectly sensible option
My apologies; I wasn't clear. *If we choose to continue to use chronicle 
queue*, what I enumerated was the only logical option I saw for us.

Altogether I think we should just move away from the library as you've laid out 
here Benedict.

On Thu, Sep 19, 2024, at 11:34 AM, Benedict wrote:
> 
> No, there is another perfectly sensible option: just implement a simple 
> serialisation format ourselves.
> 
> I am against forking their code; that is a much higher maintenance burden 
> than just writing something simple ourselves. We’ve spent longer collectively 
> discussing and maintaining this dependency than it would take to implement 
> the features we use.
> 
> I still have not heard a compelling reason we adopted it as a dependency in 
> the first place.
> 
>> On 19 Sep 2024, at 16:26, Josh McKenzie  wrote:
>> 
>>> a jerk move, but they started it with this weird release model
>> I think that's the only option given their release model and lack of 
>> backporting bugfixes to the latest ea. Either you run tip of the spear, pay 
>> them for bugfixes, or run what's effectively an unsupported LTS in the form 
>> of ea.
>> 
>> So doesn't seem like a jerk move to me as much as it seems like an 
>> eventuality of their release model.
>> 
>> On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote:
>>> I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas ~ 
>>> 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone explaining 
>>> the string operations overhead in the JVM of log concatenation vs slapping 
>>> binary to CQ’s off heap-and-append operation was substantial. 
>>> 
>>> We could hostile fork and bring the bits we use in tree (a jerk move, but 
>>> they started it with this weird release model). I’d rather avoid this, but 
>>> it’s an option seeing as how it’s ASFv2. 
>>> 
>>> On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan  
>>> wrote:
 
> When it comes to alternatives, what about logback + slf4j? It has 
> appenders where we want, it is sync / async, we can code some nio 
> appender too I guess, it logs it as text into a file so we do not need 
> any special tooling to review that. For tailing which Chronicle also 
> offers, I guess "tail -f that.log" just does the job? logback even rolls 
> the files after they are big enough so it rolls the files the same way 
> after some configured period / size as Chronicle does (It even compresses 
> the logs).
 
 Yes it was considered.  The whole point was to have a binary log because 
 serialization to/from (remember replay is part off this) text explodes the 
 size on disk and in memory as well as the processing time required and 
 does not meet the timing requirements of fqltool.
 
 -Jeremiah
>>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

Thanks for reviving this one!

On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell  wrote:

> Is there any update on this topic?  It seems that things can make a big
> progress if  Jake Luciani  can find someone who can make the
> FileSystemProvider code accessible.
>
> Jon Haddad  于2023年12月16日周六 05:29写道：
>
>> At a high level I really like the idea of being able to better leverage
>> cheaper storage especially object stores like S3.
>>
>> One important thing though - I feel pretty strongly that there's a big,
>> deal breaking downside.   Backups, disk failure policies, snapshots and
>> possibly repairs would get more complicated which haven't been particularly
>> great in the past, and of course there's the issue of failure recovery
>> being only partially possible if you're looking at a durable block store
>> paired with an ephemeral one with some of your data not replicated to the
>> cold side.  That introduces a failure case that's unacceptable for most
>> teams, which results in needing to implement potentially 2 different backup
>> solutions.  This is operationally complex with a lot of surface area for
>> headaches.  I think a lot of teams would probably have an issue with the
>> big question mark around durability and I probably would avoid it myself.
>>
>> On the other hand, I'm +1 if we approach it something slightly
>> differently - where _all_ the data is located on the cold storage, with the
>> local hot storage used as a cache.  This means we can use the cold
>> directories for the complete dataset, simplifying backups and node
>> replacements.
>>
>> For a little background, we had a ticket several years ago where I
>> pointed out it was possible to do this *today* at the operating system
>> level as long as you're using block devices (vs an object store) and LVM
>> [1].  For example, this works well with GP3 EBS w/ low IOPS provisioning +
>> local NVMe to get a nice balance of great read performance without going
>> nuts on the cost for IOPS.  I also wrote about this in a little more detail
>> in my blog [2].  There's also the new mount point tech in AWS which pretty
>> much does exactly what I've suggested above [3] that's probably worth
>> evaluating just to get a feel for it.
>>
>> I'm not insisting we require LVM or the AWS S3 fs, since that would rule
>> out other cloud providers, but I am pretty confident that the entire
>> dataset should reside in the "cold" side of things for the practical and
>> technical reasons I listed above.  I don't think it massively changes the
>> proposal, and should simplify things for everyone.
>>
>> Jon
>>
>> [1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8460
>> [3]
>> https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/
>>
>>
>> On Thu, Dec 14, 2023 at 1:56 AM Claude Warren  wrote:
>>
>>> Is there still interest in this?  Can we get some points down on
>>> electrons so that we all understand the issues?
>>>
>>> While it is fairly simple to redirect the read/write to something other
>>> than the local system for a single node this will not solve the problem for
>>> tiered storage.
>>>
>>> Tiered storage will require that on read/write the primary key be
>>> assessed and determine if the read/write should be redirected.  My
>>> reasoning for this statement is that in a cluster with a replication factor
>>> greater than 1 the node will store data for the keys that would be
>>> allocated to it in a cluster with a replication factor = 1, as well as some
>>> keys from nodes earlier in the ring.
>>>
>>> Even if we can get the primary keys for all the data we want to write to
>>> "cold storage" to map to a single node a replication factor > 1 means that
>>> data will also be placed in "normal storage" on subsequent nodes.
>>>
>>> To overcome this, we have to explore ways to route data to different
>>> storage based on the keys and that different storage may have to be
>>> available on _all_  the nodes.
>>>
>>> Have any of the partial solutions mentioned in this email chain (or
>>> others) solved this problem?
>>>
>>> Claude
>>>
>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

Did this get resolved? Is it ready for a VOTE thread?

On Tue, Jan 2, 2024 at 1:41 PM Benedict  wrote:

> The CEP expressly includes an item for coordinated cardinality estimation,
> by producing whole cluster summaries. I’m not sure if you addressed this in
> your feedback, it’s not clear what you’re referring to with distributed
> estimates, but avoiding this was expressly the driver of my suggestion to
> instead include the plan as a payload (which offers users some additional
> facilities).
>
>
> On 2 Jan 2024, at 21:26, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> I am burying the lede, but it's important to keep an eye on
> runtime-adaptive vs planning time optimization as the cost/benefits vary
> greatly between the two and runtime adaptive can be a game changer.
> Basically CBO optimizes for query efficiency and startup time at the
> expense of not handling some queries well and runtime adaptive is
> cheap/free for expensive queries and can handle cases that CBO can't.
>
> Generally speaking I am +1 on the introduction of a CBO, since it seems
> like there exists things that would benefit from it materially (and many of
> the associated refactors/cleanup) and it aligns with my north star that
> includes joins.
>
> Do we all have the same north star that Cassandra should eventually
> support joins? Just curious if that is controversial.
>
> I don't feel like this CEP in particular should need to really nail down
> exactly how distributed estimates work since we can start with using local
> estimates as a proxy for the entire cluster and then improve. If someone
> has bandwidth to do a separate CEP for that then sure that would be great,
> but this seems big enough in scope already.
>
> RE testing, continuity of performance of queries is going to be really
> important. I would really like to see that we have a fuzzed the space
> deterministically and via a collection of hand rolled cases, and can
> compare performance between versions to catch queries that regress.
> Hopefully we can agree on a baseline for releasing where we know what prior
> release to compare to and what acceptable changes in performance are.
>
> RE prepared statements - It feels to me like trying to send the plan blob
> back and forth to get more predictable, but not absolutely predictable,
> plans is not worth it? Feels like a lot for an incremental improvement over
> a baseline that doesn't exist yet, IOW it doesn't feel like something for
> V1. Maybe it ends up in YAGNI territory.
>
> The north star of predictable behavior for queries is a *very* important
> one because it means the world to users, but CBO is going to make mistakes
> all over the place. It's simply unachievable even with accurate statistics
> because it's very hard to tell how predicates will behave on a column.
>
> This segues nicely into the importance of adaptive execution :-) It's how
> you rescue the queries that CBO doesn't handle  well for any reason such as
> bugs, bad statistics, missing features. Re-ordering predicate evaluation,
> switching indexes, and re-ordering joins can all be done on the fly.
>
> CBO is really a performance optimization since adaptive approaches will
> allow any query to complete with some wasted resources.
>
> If my pager were waking me up at night and I wanted to stem the bleeding I
> would reach for runtime adaptive over CBO because I know it will catch more
> cases even if it is slower to execute up front.
>
> What is the nature of the queries we are looking solve right now? Are they
> long running heavy hitters, or short queries that explode if run
> incorrectly, or a mix of both?
>
> Ariel
>
> On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote:
>
> Hi everybody,
>
> I would like to open the discussion on the introduction of a cost based
> optimizer to allow Cassandra to pick the best execution plan based on the
> data distribution.Therefore, improving the overall query performance.
>
> This CEP should also lay the groundwork for the future addition of
> features like joins, subqueries, OR/NOT and index ordering.
>
> The proposal is here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>
> Thank you in advance for your feedback.
>
>
>

Re: [DISCUSSION] CEP-38: CQL Management API

2024-09-19 Thread Dinesh Joshi

no. Maxim and I have had some offline discussions. We need to make some
changes before we can be ready to vote on it.

On Thu, Sep 19, 2024 at 11:09 AM Patrick McFadin  wrote:

> There is no VOTE thread for this CEP. Is this ready for one?
>
> On Tue, Jan 9, 2024 at 3:28 AM Maxim Muzafarov  wrote:
>
>> Jon,
>>
>> That sounds good.  Let's make these commands rely on the settings
>> virtual table and keep the initial changes as minimal as possible.
>>
>> We've also scheduled a Cassandra Contributor Meeting on January 30th
>> 2024, so I'll prepare some slides with everything we've got so far and
>> try to prepare some drafts to demonstrate the design.
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting
>>
>> On Tue, 9 Jan 2024 at 00:55, Jon Haddad  wrote:
>> >
>> > It's great to see where this is going and thanks for the discussion on
>> the ML.
>> >
>> > Personally, I think adding two new ways of accomplishing the same thing
>> is a net negative.  It means we need more documentation and creates
>> inconsistencies across tools and users.  The tradeoffs you've listed are
>> worth considering, but in my opinion adding 2 new ways to accomplish the
>> same thing hurts the project more than it helps.
>> >
>> > > - I'd like to see a symmetry between the JMX and CQL APIs, so that
>> users will have a sense of the commands they are using and are less
>> > likely to check the documentation;
>> >
>> > I've worked with a couple hundred teams and I can only think of a few
>> who use JMX directly.  It's done very rarely.  After 10 years, I still have
>> to look up the JMX syntax to do anything useful, especially if there's any
>> quoting involved.  Power users might know a handful of JMX commands by
>> heart, but I suspect most have a handful of bash scripts they use instead,
>> or have a sidecar.  I also think very few users will migrate their
>> management code from JMX to CQL, nor do I imagine we'll move our own tools
>> until the `disablebinary` problem is solved.
>> >
>> > > - It will be easier for us to move the nodetool from the jmx client
>> that is used under the hood to an implementation based on a java-driver and
>> use the CQL for the same;
>> >
>> > I can't imagine this would make a material difference.  If someone's
>> rewriting a nodetool command, how much time will be spent replacing the JMX
>> call with a CQL one?  Looking up a virtual table isn't going to be what
>> consumes someone's time in this process.  Again, this won't be done without
>> solving `nodetool disablebinary`.
>> >
>> > > if we have cassandra-15254 merged, it will cost almost nothing to
>> support the exec syntax for setting properties;
>> >
>> > My concern is more about the weird user experience of having two ways
>> of doing the same thing, less about the technical overhead of adding a
>> second implementation.  I propose we start simple, see if any of the
>> reasons you've listed are actually a real problem, then if they are,
>> address the issue in a follow up.
>> >
>> > If I'm wrong, it sounds like it's fairly easy to add `exec` for
>> changing configs.  If I'm right, we'll have two confusing syntaxes
>> forever.  It's a lot easier to add something later than take it away.
>> >
>> > How does that sound?
>> >
>> > Jon
>> >
>> >
>> >
>> >
>> > On Mon, Jan 8, 2024 at 7:55 PM Maxim Muzafarov 
>> wrote:
>> >>
>> >> > Some operations will no doubt require a stored procedure syntax, but
>> perhaps it would be a good idea to split the work into two:
>> >>
>> >> These are exactly the first steps I have in mind:
>> >>
>> >> [Ready for review]
>> >> Allow UPDATE on settings virtual table to change running configurations
>> >> https://issues.apache.org/jira/browse/CASSANDRA-15254
>> >>
>> >> This issue is specifically aimed at changing the configuration
>> >> properties we are talking about (value is in yaml format):
>> >> e.g. UPDATE system_views.settings SET compaction_throughput = 128Mb/s;
>> >>
>> >> [Ready for review]
>> >> Expose all table metrics in virtual table
>> >> https://issues.apache.org/jira/browse/CASSANDRA-14572
>> >>
>> >> This is to observe the running configuration and all available metrics:
>> >> e.g. select * from system_views.thread_pools;
>> >>
>> >>
>> >> I hope both of the issues above will become part of the trunk branch
>> >> before we move on to the CQL management commands. In this topic, I'd
>> >> like to discuss the design of the CQL API, and gather feedback, so
>> >> that I can prepare a draft of changes to look at without any
>> >> surprises, and that's exactly what this discussion is about.
>> >>
>> >>
>> >> cqlsh> UPDATE system.settings SET compaction_throughput = 128;
>> >> cqlsh> exec setcompactionthroughput 128
>> >>
>> >> I don't mind removing the exec command from the CQL command API which
>> >> is intended to change settings. Personally, I see the second option as
>> >> just an alias for the first command, and in fact, they will have the
>> >> same implementation und

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

I personally don’t mind switching off Chronicle Queue.  I have a transformer 
function to convert the FQL logs to Thrift (don’t judge) and use easy-cas to 
reply on a cluster… replying FQL from Chronicle Queue was far slower than 
Thrift and was hard to push the cluster as the client was the bottleneck… 
switching off it let me actually cause Cassandra to be the bottleneck…

>>> No, there is another perfectly sensible option: just implement a simple 
>>> serialisation format ourselves.

My one issue with this is we need to ask who the target audience is?  Trying to 
add FQL reply to easy-cas was a pain for 2 reasons: Chronicle Queue is slow, 
custom C* serializers that must be in the class path (that brings a ton of 
baggage with it)… 

For me FQL has 2 use cases

1) analytic, what are people actually doing and what are there frequencies?
2) reply

In both cases custom serializers are a pain due to baggage they bring and the 
limiting nature of it… what if I want a Go based FQL reply?  I need java code 
from cassandra-all…

I personally favor serializers like protobuf/thrift as they are portable and 
can be used by users without issues.  As for the log format itself… a super 
simple log format that can be easy to read that is custom is fine by me… I am 
cool with the log being custom as I don’t know a good portable log format at 
the top of my head… a simple thing like the following works for me

Header: lengths, checksum, etc.
Body: std serializer
+

> On Sep 19, 2024, at 9:14 AM, C. Scott Andreas  wrote:
> 
> Agree with Benedict's proposal here.
> 
> In circumstances when I've needed to capture and work with FQL, I've found it 
> cumbersome to work with Chronicle. The dial-home functionality and release 
> process changes put it over the top for me.
> 
> – Scott
> 
>> On Sep 19, 2024, at 8:40 AM, Josh McKenzie  wrote:
>> 
>> 
>>> there is another perfectly sensible option
>> My apologies; I wasn't clear. If we choose to continue to use chronicle 
>> queue, what I enumerated was the only logical option I saw for us.
>> 
>> Altogether I think we should just move away from the library as you've laid 
>> out here Benedict.
>> 
>> On Thu, Sep 19, 2024, at 11:34 AM, Benedict wrote:
>>> 
>>> No, there is another perfectly sensible option: just implement a simple 
>>> serialisation format ourselves.
>>> 
>>> I am against forking their code; that is a much higher maintenance burden 
>>> than just writing something simple ourselves. We’ve spent longer 
>>> collectively discussing and maintaining this dependency than it would take 
>>> to implement the features we use.
>>> 
>>> I still have not heard a compelling reason we adopted it as a dependency in 
>>> the first place.
>>> 
 On 19 Sep 2024, at 16:26, Josh McKenzie  wrote:

> a jerk move, but they started it with this weird release model
 I think that's the only option given their release model and lack of 
 backporting bugfixes to the latest ea. Either you run tip of the spear, 
 pay them for bugfixes, or run what's effectively an unsupported LTS in the 
 form of ea.

 So doesn't seem like a jerk move to me as much as it seems like an 
 eventuality of their release model.

 On Wed, Sep 18, 2024, at 7:02 PM, Nate McCall wrote:
> I feel like a group of us discussed this IRL a bit at ApacheCon in Vegas 
> ~ 2019 maybe? Anyhoo, the tidbit sticking in my mind was someone 
> explaining the string operations overhead in the JVM of log concatenation 
> vs slapping binary to CQ’s off heap-and-append operation was substantial. 
> 
> We could hostile fork and bring the bits we use in tree (a jerk move, but 
> they started it with this weird release model). I’d rather avoid this, 
> but it’s an option seeing as how it’s ASFv2. 
> 
> On Thu, 19 Sep 2024 at 5:08 AM, Jeremiah Jordan 
> mailto:jeremiah.jor...@gmail.com>> wrote:
> 
>> When it comes to alternatives, what about logback + slf4j? It has 
>> appenders where we want, it is sync / async, we can code some nio 
>> appender too I guess, it logs it as text into a file so we do not need 
>> any special tooling to review that. For tailing which Chronicle also 
>> offers, I guess "tail -f that.log" just does the job? logback even rolls 
>> the files after they are big enough so it rolls the files the same way 
>> after some configured period / size as Chronicle does (It even 
>> compresses the logs).
> 
> 
> Yes it was considered.  The whole point was to have a binary log because 
> serialization to/from (remember replay is part off this) text explodes 
> the size on disk and in memory as well as the processing time required 
> and does not meet the timing requirements of fqltool.
> 
> -Jeremiah

>> 
> 
>

Re: [Discuss] Repair inside C*

Is this CEP ready for a VOTE thread?
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution

On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Thanks, Josh. I've just updated the CEP
> 
> and included all the solutions you mentioned below.
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie 
> wrote:
>
>> Very late response from me here (basically necro'ing this thread).
>>
>> I think it'd be useful to get this condensed into a CEP that we can then
>> discuss in that format. It's clearly something we all agree we need and
>> having an implementation that works, even if it's not in your preferred
>> execution domain, is vastly better than nothing IMO.
>>
>> I don't have cycles (nor background ;) ) to do that, but it sounds like
>> you do Jaydeep given the implementation you have on a private fork + design.
>>
>> A non-exhaustive list of things that might be useful incorporating into
>> or referencing from a CEP:
>> Slack thread:
>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> Joey's old C* ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>> Even older automatic repair scheduling:
>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>> Your design gdoc:
>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>> PR with automated repair:
>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>
>> My intuition is that we're all basically in agreement that this is
>> something the DB needs, we're all willing to bikeshed for our personal
>> preference on where it lives and how it's implemented, and at the end of
>> the day, code talks. I don't think anyone's said they'll die on the hill of
>> implementation details, so that feels like CEP time to me.
>>
>> If you were willing and able to get a CEP together for automated repair
>> based on the above material, given you've done the work and have the proof
>> points it's working at scale, I think this would be a *huge contribution*
>> to the community.
>>
>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>
>> Is anyone going to file an official CEP for this?
>> As mentioned in this email thread, here is one of the solution's design
>> doc
>> 
>> and source code on a private Apache Cassandra patch. Could you go through
>> it and let me know what you think?
>>
>> Jaydeep
>>
>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
>> wrote:
>>
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>>
>> This is something I hadn't thought much about, and is a pretty good
>> argument for using the sidecar initially.  There's a lot of deployments out
>> there and having an official repair option would be a big win.
>>
>>
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler
>> in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>> >
>> > Once TCM has landed, we’ll have much stronger primitives for repair
>> orchestration in the database itself. But I don’t think that should block
>> progress on a repair scheduling solution in the sidecar, and there is
>> nothing that would prevent someone from continuing to use a sidecar-based
>> solution in perpetuity if they preferred.
>> >
>> > - Scott
>> >
>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
>> wrote:
>> > >
>> > > I'm 100% in favor of repair being part of the core DB, not the
>> sidecar.  The current (and past) state of things where running the DB
>> correctly *requires* running a separate process (either community
>> maintained or official C* sidecar) is incredibly painful for folks.  The
>> idea that your data integrity needs to be opt-in has never made sense to me
>> from the perspective of either the product or the end user.
>> > >
>> > > I've worked with way too many teams that have either configured this
>> incorrectly or not at all.
>> > >
>> > > Ideally Cassandra would ship with repair built in and on by default.
>> Power users can disable if they want to continue to maintain their own
>> repair tooling for some reason.
>> > >
>> > > Jon
>> > >
>> > >> On

Re: [DISCUSSION] CEP-38: CQL Management API

There is no VOTE thread for this CEP. Is this ready for one?

On Tue, Jan 9, 2024 at 3:28 AM Maxim Muzafarov  wrote:

> Jon,
>
> That sounds good.  Let's make these commands rely on the settings
> virtual table and keep the initial changes as minimal as possible.
>
> We've also scheduled a Cassandra Contributor Meeting on January 30th
> 2024, so I'll prepare some slides with everything we've got so far and
> try to prepare some drafts to demonstrate the design.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Contributor+Meeting
>
> On Tue, 9 Jan 2024 at 00:55, Jon Haddad  wrote:
> >
> > It's great to see where this is going and thanks for the discussion on
> the ML.
> >
> > Personally, I think adding two new ways of accomplishing the same thing
> is a net negative.  It means we need more documentation and creates
> inconsistencies across tools and users.  The tradeoffs you've listed are
> worth considering, but in my opinion adding 2 new ways to accomplish the
> same thing hurts the project more than it helps.
> >
> > > - I'd like to see a symmetry between the JMX and CQL APIs, so that
> users will have a sense of the commands they are using and are less
> > likely to check the documentation;
> >
> > I've worked with a couple hundred teams and I can only think of a few
> who use JMX directly.  It's done very rarely.  After 10 years, I still have
> to look up the JMX syntax to do anything useful, especially if there's any
> quoting involved.  Power users might know a handful of JMX commands by
> heart, but I suspect most have a handful of bash scripts they use instead,
> or have a sidecar.  I also think very few users will migrate their
> management code from JMX to CQL, nor do I imagine we'll move our own tools
> until the `disablebinary` problem is solved.
> >
> > > - It will be easier for us to move the nodetool from the jmx client
> that is used under the hood to an implementation based on a java-driver and
> use the CQL for the same;
> >
> > I can't imagine this would make a material difference.  If someone's
> rewriting a nodetool command, how much time will be spent replacing the JMX
> call with a CQL one?  Looking up a virtual table isn't going to be what
> consumes someone's time in this process.  Again, this won't be done without
> solving `nodetool disablebinary`.
> >
> > > if we have cassandra-15254 merged, it will cost almost nothing to
> support the exec syntax for setting properties;
> >
> > My concern is more about the weird user experience of having two ways of
> doing the same thing, less about the technical overhead of adding a second
> implementation.  I propose we start simple, see if any of the reasons
> you've listed are actually a real problem, then if they are, address the
> issue in a follow up.
> >
> > If I'm wrong, it sounds like it's fairly easy to add `exec` for changing
> configs.  If I'm right, we'll have two confusing syntaxes forever.  It's a
> lot easier to add something later than take it away.
> >
> > How does that sound?
> >
> > Jon
> >
> >
> >
> >
> > On Mon, Jan 8, 2024 at 7:55 PM Maxim Muzafarov 
> wrote:
> >>
> >> > Some operations will no doubt require a stored procedure syntax, but
> perhaps it would be a good idea to split the work into two:
> >>
> >> These are exactly the first steps I have in mind:
> >>
> >> [Ready for review]
> >> Allow UPDATE on settings virtual table to change running configurations
> >> https://issues.apache.org/jira/browse/CASSANDRA-15254
> >>
> >> This issue is specifically aimed at changing the configuration
> >> properties we are talking about (value is in yaml format):
> >> e.g. UPDATE system_views.settings SET compaction_throughput = 128Mb/s;
> >>
> >> [Ready for review]
> >> Expose all table metrics in virtual table
> >> https://issues.apache.org/jira/browse/CASSANDRA-14572
> >>
> >> This is to observe the running configuration and all available metrics:
> >> e.g. select * from system_views.thread_pools;
> >>
> >>
> >> I hope both of the issues above will become part of the trunk branch
> >> before we move on to the CQL management commands. In this topic, I'd
> >> like to discuss the design of the CQL API, and gather feedback, so
> >> that I can prepare a draft of changes to look at without any
> >> surprises, and that's exactly what this discussion is about.
> >>
> >>
> >> cqlsh> UPDATE system.settings SET compaction_throughput = 128;
> >> cqlsh> exec setcompactionthroughput 128
> >>
> >> I don't mind removing the exec command from the CQL command API which
> >> is intended to change settings. Personally, I see the second option as
> >> just an alias for the first command, and in fact, they will have the
> >> same implementation under the hood, so please consider the rationale
> >> below:
> >>
> >> - I'd like to see a symmetry between the JMX and CQL APIs, so that
> >> users will have a sense of the commands they are using and are less
> >> likely to check the documentation;
> >> - It will be easier for us to mov

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

The work has begun but we don't have a VOTE thread for this CEP. Can one
get started?

On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia 
wrote:

> Sure, Caleb. I will include the work as part of CASSANDRA-19534
>  in the CEP-41.
>
> Jaydeep
>
> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe 
> wrote:
>
>> FYI, there is some ongoing sort-of-related work going on in
>> CASSANDRA-19534 
>>
>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Just created an official CEP-41
>>> 
>>> incorporating the feedback from this discussion. Feel free to let me know
>>> if I may have missed some important feedback in this thread that is not
>>> captured in the CEP-41.
>>>
>>> Jaydeep
>>>
>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
 Thanks, Josh. I will file an official CEP with all the details in a few
 days and update this thread with that CEP number.
 Thanks a lot everyone for providing valuable insights!

 Jaydeep

 On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
 wrote:

> Do folks think we should file an official CEP and take it there?
>
> +1 here.
>
> Synthesizing your gdoc, Caleb's work, and the feedback from this
> thread into a draft seems like a solid next step.
>
> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>
> I see a lot of great ideas being discussed or proposed in the past to
> cover the most common rate limiter candidate use cases. Do folks think we
> should file an official CEP and take it there?
>
> Jaydeep
>
> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <
> calebrackli...@gmail.com> wrote:
>
> I just remembered the other day that I had done a quick writeup on the
> state of compaction stress-related throttling in the project:
>
>
> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>
> I'm sure most of it is old news to the people on this thread, but I
> figured I'd post it just in case :)
>
> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
> wrote:
>
>
> 2.) We should make sure the links between the "known" root causes of
> cascading failures and the mechanisms we introduce to avoid them remain
> very strong.
>
> Seems to me that our historical strategy was to address individual
> known cases one-by-one rather than looking for a more holistic
> load-balancing and load-shedding solution. While the engineer in me likes
> the elegance of a broad, more-inclusive *actual SEDA-like* approach,
> the pragmatist in me wonders how far we think we are today from a stable
> set-point.
>
> i.e. are we facing a handful of cases where nodes can still get pushed
> over and then cascade that we can surgically address, or are we facing a
> broader lack of back-pressure that rears its head in different domains
> (client -> coordinator, coordinator -> replica, internode with other
> operations, etc) at surprising times and should be considered more
> holistically?
>
> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>
> I almost forgot CASSANDRA-15817, which introduced
> reject_repair_compaction_threshold, which provides a mechanism to stop
> repairs while compaction is underwater.
>
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
> wrote:
>
> 
> Hey all,
>
> I'm a bit late to the discussion. I see that we've already discussed
> CASSANDRA-15013
>  and
> CASSANDRA-16663
>  at least in
> passing. Having written the latter, I'd be the first to admit it's a crude
> tool, although it's been useful here and there, and provides a couple
> primitives that may be useful for future work. As Scott mentions, while it
> is configurable at runtime, it is not adaptive, although we did
> make configuration easier in CASSANDRA-17423
> . It also is
> global to the node, although we've lightly discussed some ideas around
> making it more granular. (For example, keyspace-based limiting, or 
> limiting
> "domains" tagged by the client in requests, could be interesting.) It also
> does not deal with inter-node traffic, of course.
>
> Something we've not yet mentioned (that does address internode
> traffic) is CASSANDRA-17324
> , which I
> proposed shortly after working on the native

Re: [VOTE] CEP-42: Constraints Framework

I'm going to cap this thread. Vote passes with no binding -1s.

On Tue, Jul 2, 2024 at 2:25 PM Jordan West  wrote:

> +1
>
> On Tue, Jul 2, 2024 at 12:15 Francisco Guerrero 
> wrote:
>
>> +1
>>
>> On 2024/07/02 18:45:33 Josh McKenzie wrote:
>> > +1
>> >
>> > On Tue, Jul 2, 2024, at 1:18 PM, Abe Ratnofsky wrote:
>> > > +1 (nb)
>> > >
>> > >> On Jul 2, 2024, at 12:15 PM, Yifan Cai  wrote:
>> > >>
>> > >> +1 on CEP-42.
>> > >>
>> > >> - Yifan
>> > >>
>> > >> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad  wrote:
>> > >>> +1
>> > >>>
>> > >>> On Tue, Jul 2, 2024 at 5:06 AM  wrote:
>> >  +1
>> > 
>> > 
>> > > On Jul 1, 2024, at 8:34 PM, Doug Rohrer 
>> wrote:
>> > >
>> > > +1 (nb) - Thanks for all of the suggestions and Bernardo for
>> wrangling the CEP into shape!
>> > >
>> > > Doug
>> > >
>> > >> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi 
>> wrote:
>> > >>
>> > >> +1
>> > >>
>> > >> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg <
>> ar...@weisberg.ws> wrote:
>> > >>> __
>> > >>> Hi,
>> > >>>
>> > >>> I am +1 on CEP-42 with the latest updates to the CEP to clarify
>> syntax, error messages, constraint naming and generated naming, alter/drop,
>> describe etc.
>> > >>>
>> > >>> I think this now tracks very closely to how other SQL databases
>> define constraints and the syntax is easily extensible to multi-column and
>> multi-table constraints.
>> > >>>
>> > >>> Ariel
>> > >>>
>> > >>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>> >  With all the feedback that came in the discussion thread after
>> the call for votes, I’d like to extend the period another 72 hours starting
>> today.
>> > 
>> >  As before, a vote passes if there are at least 3 binding +1s
>> and no binding vetoes.
>> > 
>> >  Thanks,
>> >  Bernardo Botella
>> > 
>> > > On Jun 24, 2024, at 7:17 AM, Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>> > >
>> > > Hi everyone,
>> > >
>> > > I would like to start the voting for CEP-42.
>> > >
>> > > Proposal:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>> > > Discussion:
>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>> > >
>> > > The vote will be open for 72 hours. A vote passes if there
>> are at least 3 binding +1s and no binding vetoes.
>> > >
>> > > Thanks,
>> > > Bernardo Botella
>> > >>>
>>
>

Re: [VOTE] CEP-42: Constraints Framework

2024-09-19 Thread Bernardo Botella

Hi Patrick,

Thanks for taking a look at this and keeping the house tidy.

I announced the voting results on a sepparate thread:
https://lists.apache.org/thread/v73cwc8p80xx7zpkldjq6w1qrkf2k9h0

As a follow up, this is not stalled, and I’m currently working on a patch that 
will be soon available for review.

Thanks,
Bernardo


> On Sep 19, 2024, at 11:20 AM, Patrick McFadin  wrote:
> 
> I'm going to cap this thread. Vote passes with no binding -1s.
> 
> On Tue, Jul 2, 2024 at 2:25 PM Jordan West  > wrote:
>> +1
>> 
>> On Tue, Jul 2, 2024 at 12:15 Francisco Guerrero > > wrote:
>>> +1
>>> 
>>> On 2024/07/02 18:45:33 Josh McKenzie wrote:
>>> > +1
>>> > 
>>> > On Tue, Jul 2, 2024, at 1:18 PM, Abe Ratnofsky wrote:
>>> > > +1 (nb)
>>> > > 
>>> > >> On Jul 2, 2024, at 12:15 PM, Yifan Cai >> > >> > wrote:
>>> > >> 
>>> > >> +1 on CEP-42.
>>> > >> 
>>> > >> - Yifan
>>> > >> 
>>> > >> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad >> > >> > wrote:
>>> > >>> +1
>>> > >>> 
>>> > >>> On Tue, Jul 2, 2024 at 5:06 AM >> > >>> > wrote:
>>> >  +1
>>> >  
>>> >  
>>> > > On Jul 1, 2024, at 8:34 PM, Doug Rohrer >> > > > wrote:
>>> > > 
>>> > > +1 (nb) - Thanks for all of the suggestions and Bernardo for 
>>> > > wrangling the CEP into shape!
>>> > > 
>>> > > Doug
>>> > > 
>>> > >> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi >> > >> > wrote:
>>> > >> 
>>> > >> +1
>>> > >> 
>>> > >> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg >> > >> > wrote:
>>> > >>> __
>>> > >>> Hi,
>>> > >>> 
>>> > >>> I am +1 on CEP-42 with the latest updates to the CEP to clarify 
>>> > >>> syntax, error messages, constraint naming and generated naming, 
>>> > >>> alter/drop, describe etc.
>>> > >>> 
>>> > >>> I think this now tracks very closely to how other SQL databases 
>>> > >>> define constraints and the syntax is easily extensible to 
>>> > >>> multi-column and multi-table constraints.
>>> > >>> 
>>> > >>> Ariel
>>> > >>> 
>>> > >>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>>> >  With all the feedback that came in the discussion thread after 
>>> >  the call for votes, I’d like to extend the period another 72 
>>> >  hours starting today.
>>> >  
>>> >  As before, a vote passes if there are at least 3 binding +1s and 
>>> >  no binding vetoes.
>>> >  
>>> >  Thanks,
>>> >  Bernardo Botella
>>> >  
>>> > > On Jun 24, 2024, at 7:17 AM, Bernardo Botella 
>>> > > >> > > > wrote:
>>> > > 
>>> > > Hi everyone,
>>> > > 
>>> > > I would like to start the voting for CEP-42.
>>> > > 
>>> > > Proposal: 
>>> > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>>> > > Discussion: 
>>> > > https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>>> > > 
>>> > > The vote will be open for 72 hours. A vote passes if there are 
>>> > > at least 3 binding +1s and no binding vetoes.
>>> > > 
>>> > > Thanks,
>>> > > Bernardo Botella
>>> > >>>

Re: [DISCUSS] CEP-39: Cost Based Optimizer

I am personally in favor of the idea and some of the finer details can be 
worked around I think.

In https://issues.apache.org/jira/browse/CASSANDRA-19769 I added a AST for CQL 
for tests and improving our testing, which led me to file several tickets as it 
found bugs…. There is a large part of me that wants to bike shed and I have to 
tell myself that improving CQL != CBO…


> On Sep 19, 2024, at 11:10 AM, Patrick McFadin  wrote:
> 
> Did this get resolved? Is it ready for a VOTE thread?
> 
> On Tue, Jan 2, 2024 at 1:41 PM Benedict  > wrote:
>> The CEP expressly includes an item for coordinated cardinality estimation, 
>> by producing whole cluster summaries. I’m not sure if you addressed this in 
>> your feedback, it’s not clear what you’re referring to with distributed 
>> estimates, but avoiding this was expressly the driver of my suggestion to 
>> instead include the plan as a payload (which offers users some additional 
>> facilities). 
>> 
>> 
>>> On 2 Jan 2024, at 21:26, Ariel Weisberg >> > wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> I am burying the lede, but it's important to keep an eye on 
>>> runtime-adaptive vs planning time optimization as the cost/benefits vary 
>>> greatly between the two and runtime adaptive can be a game changer. 
>>> Basically CBO optimizes for query efficiency and startup time at the 
>>> expense of not handling some queries well and runtime adaptive is 
>>> cheap/free for expensive queries and can handle cases that CBO can't.
>>> 
>>> Generally speaking I am +1 on the introduction of a CBO, since it seems 
>>> like there exists things that would benefit from it materially (and many of 
>>> the associated refactors/cleanup) and it aligns with my north star that 
>>> includes joins.
>>> 
>>> Do we all have the same north star that Cassandra should eventually support 
>>> joins? Just curious if that is controversial.
>>> 
>>> I don't feel like this CEP in particular should need to really nail down 
>>> exactly how distributed estimates work since we can start with using local 
>>> estimates as a proxy for the entire cluster and then improve. If someone 
>>> has bandwidth to do a separate CEP for that then sure that would be great, 
>>> but this seems big enough in scope already.
>>> 
>>> RE testing, continuity of performance of queries is going to be really 
>>> important. I would really like to see that we have a fuzzed the space 
>>> deterministically and via a collection of hand rolled cases, and can 
>>> compare performance between versions to catch queries that regress. 
>>> Hopefully we can agree on a baseline for releasing where we know what prior 
>>> release to compare to and what acceptable changes in performance are.
>>> 
>>> RE prepared statements - It feels to me like trying to send the plan blob 
>>> back and forth to get more predictable, but not absolutely predictable, 
>>> plans is not worth it? Feels like a lot for an incremental improvement over 
>>> a baseline that doesn't exist yet, IOW it doesn't feel like something for 
>>> V1. Maybe it ends up in YAGNI territory.
>>> 
>>> The north star of predictable behavior for queries is a *very* important 
>>> one because it means the world to users, but CBO is going to make mistakes 
>>> all over the place. It's simply unachievable even with accurate statistics 
>>> because it's very hard to tell how predicates will behave on a column.
>>> 
>>> This segues nicely into the importance of adaptive execution :-) It's how 
>>> you rescue the queries that CBO doesn't handle  well for any reason such as 
>>> bugs, bad statistics, missing features. Re-ordering predicate evaluation, 
>>> switching indexes, and re-ordering joins can all be done on the fly.
>>> 
>>> CBO is really a performance optimization since adaptive approaches will 
>>> allow any query to complete with some wasted resources.
>>> 
>>> If my pager were waking me up at night and I wanted to stem the bleeding I 
>>> would reach for runtime adaptive over CBO because I know it will catch more 
>>> cases even if it is slower to execute up front.
>>> 
>>> What is the nature of the queries we are looking solve right now? Are they 
>>> long running heavy hitters, or short queries that explode if run 
>>> incorrectly, or a mix of both?
>>> 
>>> Ariel
>>> 
>>> On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote:
 Hi everybody,
 
 I would like to open the discussion on the introduction of a cost based 
 optimizer to allow Cassandra to pick the best execution plan based on the 
 data distribution.Therefore, improving the overall query performance.
 
 This CEP should also lay the groundwork for the future addition of 
 features like joins, subqueries, OR/NOT and index ordering.
 
 The proposal is here: 
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
 
 Thank you in advance for your feedback.
>>>

Re: [Discuss] Repair inside C*

2024-09-19 Thread Josh McKenzie

Not quite; finishing touches on the CEP and design doc are in flight (as of 
last / this week).

Soon(tm).

On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
> Is this CEP ready for a VOTE thread? 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
> 
> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia 
>  wrote:
>> Thanks, Josh. I've just updated the CEP 
>> 
>>  and included all the solutions you mentioned below.  
>> 
>> Jaydeep
>> 
>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie  wrote:
>>> __
>>> Very late response from me here (basically necro'ing this thread).
>>> 
>>> I think it'd be useful to get this condensed into a CEP that we can then 
>>> discuss in that format. It's clearly something we all agree we need and 
>>> having an implementation that works, even if it's not in your preferred 
>>> execution domain, is vastly better than nothing IMO.
>>> 
>>> I don't have cycles (nor background ;) ) to do that, but it sounds like you 
>>> do Jaydeep given the implementation you have on a private fork + design.
>>> 
>>> A non-exhaustive list of things that might be useful incorporating into or 
>>> referencing from a CEP:
>>> Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>> Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
>>> Even older automatic repair scheduling: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>> Your design gdoc: 
>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>> PR with automated repair: 
>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>> 
>>> My intuition is that we're all basically in agreement that this is 
>>> something the DB needs, we're all willing to bikeshed for our personal 
>>> preference on where it lives and how it's implemented, and at the end of 
>>> the day, code talks. I don't think anyone's said they'll die on the hill of 
>>> implementation details, so that feels like CEP time to me.
>>> 
>>> If you were willing and able to get a CEP together for automated repair 
>>> based on the above material, given you've done the work and have the proof 
>>> points it's working at scale, I think this would be a *huge contribution* 
>>> to the community.
>>> 
>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
 Is anyone going to file an official CEP for this?
 As mentioned in this email thread, here is one of the solution's design 
 doc 
 
  and source code on a private Apache Cassandra patch. Could you go through 
 it and let me know what you think?
 
 Jaydeep
 
 On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad  
 wrote:
> > That said I would happily support an effort to bring repair scheduling 
> > to the sidecar immediately. This has nothing blocking it, and would 
> > potentially enable the sidecar to provide an official repair scheduling 
> > solution that is compatible with current or even previous versions of 
> > the database.
> 
> This is something I hadn't thought much about, and is a pretty good 
> argument for using the sidecar initially.  There's a lot of deployments 
> out there and having an official repair option would be a big win. 
> 
> 
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler 
> > in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling 
> > to the sidecar immediately. This has nothing blocking it, and would 
> > potentially enable the sidecar to provide an official repair scheduling 
> > solution that is compatible with current or even previous versions of 
> > the database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair 
> > orchestration in the database itself. But I don’t think that should 
> > block progress on a repair scheduling solution in the sidecar, and 
> > there is nothing that would prevent someone from continuing to use a 
> > sidecar-based solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad  
> > > wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the 
> > > sidecar.  The current (and past) state of things where running the DB 
> > > correctly *requires* running a separate process (either community 
> > > maintained or official C* sidecar) is incredibly painful for folks.  
> > > The idea that your data integrity needs to be opt-in has never made

Re: [DISCUSS] Introduce CREATE TABLE LIKE grammer

Is this CEP ready for a VOTE thread?

On Sat, Aug 24, 2024 at 8:56 PM guo Maxwell  wrote:

> Thank you for your replies, I will prepare a CEP later.
>
> Patrick McFadin  于2024年8月20日周二 02:11写道：
>
>> +1 This is a CEP
>>
>> On Mon, Aug 19, 2024 at 10:50 AM Jon Haddad  wrote:
>>
>>> Given the fairly large surface area for this, i think it should be a
>>> CEP.
>>>
>>> —
>>> Jon Haddad
>>> Rustyrazorblade Consulting
>>> rustyrazorblade.com
>>>
>>>
>>> On Mon, Aug 19, 2024 at 10:44 AM Bernardo Botella <
>>> conta...@bernardobotella.com> wrote:
>>>
 Definitely a nice addition to CQL.

 Looking for inspiration at how Postgres and Mysql do that may also help
 with the final design (I like the WITH proposed by Stefan, but I would
 definitely take a look at the INCLUDING keyword proposed by Postgres).
 https://www.postgresql.org/docs/current/sql-createtable.html
 https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html

 On top of that, and as part of the interesting questions, I would like
 to add the permissions to the mix. Both the question about copying them
 over (with a WITH keyword probably), and the need for read permissions on
 the source table as well.

 Bernardo

 On Aug 19, 2024, at 10:01 AM, Štefan Miklošovič 
 wrote:

 BTW this would be cool to do as well:

 ALTER TABLE ks.to_copy LIKE ks.tb WITH INDICES;

 This would mean that if we create a copy of a table, later we can
 decide that we need indices too, so we might "enrich" that table with
 indices from the old one without necessarily explicitly re-creating them on
 that new table.

 On Mon, Aug 19, 2024 at 6:55 PM Štefan Miklošovič <
 smikloso...@apache.org> wrote:

> I think this is an interesting idea worth exploring. I definitely
> agree with Benjamin who raised important questions which needs to be
> answered first. Also, what about triggers?
>
> It might be rather "easy" to come up with something simple but it
> should be a comprehensive solution with predictable behavior we all agree
> on.
>
> If a keyspace of a new table does not exist we would need to create
> that one too before. For the simplicity, I would just make it a must to
> create it on same keyspace. We might iterate on that in the future.
>
> UDTs are created per keyspace so there is nothing to re-create. We
> just need to reference it from a new table, right?
>
> Indexes and MVs are interesting but in theory they might be re-created
> too.
>
> Would it be appropriate to use something like this?
>
> CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND VIEWS AND TRIGGERS
> 
>
> Without "WITH" it would just copy a table with nothing else.
>
> On Mon, Aug 19, 2024 at 6:10 PM guo Maxwell 
> wrote:
>
>> Hello， everyone：
>> As  Jira CASSANDRA-7662
>>  has described
>> , we would like to introduce a new grammer " CREATE TABLE LIKE "
>> ,which  simplifies creating new tables duplicating the existing ones .
>> The format may be like : CREATE TABLE  LIKE 
>>
>> Before I implement this function, do you have any suggestions on this?
>>
>> Looking forward to your reply！
>>
>

Housekeeping on CEPs

Hi everyone,

You might have noticed I just did a pass through the current set of CEPs.
First I have to say, there are some great ones in here and love the process
we have created. It's a great sign of maturity for the project.

As I was going through I noticed some things to remind everyone about.

If you are the creator of the CEP, please maintain the doc in CWIKI. I
cleaned up a few but there is inconsistent information from DISCUSS, Jira,
and CEP docs. Take a minute and do a few edits.

Every CEP needs a [DISCUSS] and a [VOTE] thread. When those happen, update
the CEP docs in CWIKI.

Thanks!

Patrick

Re: [VOTE] CEP-42: Constraints Framework