Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jon Haddad
This also seems like an optimization. Why not go in 5.0?


On Tue, Oct 1, 2024 at 10:14 PM Jordan West  wrote:

> Agreed this would absolutely be a win. Dont see need for a flag either.
>
> On Tue, Oct 1, 2024 at 1:31 PM Caleb Rackliffe 
> wrote:
>
>> Alrighty, with what looks like a fair amount of support, I'll declare
>> CASSANDRA-19968  ready
>> for some preliminary review.
>>
>> On Tue, Oct 1, 2024 at 2:41 PM Caleb Rackliffe 
>> wrote:
>>
>>> We did add CASSANDRA-18940
>>>  to make sure
>>> local SAI post-filtering reads got picked up somewhere, but you're right
>>> that StorageProxy#readRegular() would start recording some index
>>> queries in the normal read metrics.
>>>
>>> On Tue, Oct 1, 2024 at 2:11 PM Jeremiah Jordan <
>>> jeremiah.jor...@gmail.com> wrote:
>>>
 Did we add new metrics for index queries?  The only issue I see is that
 this change will mix index queries into the regular read metrics, where
 before they were in the range metrics, so maybe some changes to metrics
 should go with it.  But I think this is a good change over all.

 On Oct 1, 2024 at 1:51:10 PM, Jon Haddad 
 wrote:

> This seems like it's strictly a win.  Doesn't sound to me like a flag
> is needed.
>
> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe <
> calebrackli...@gmail.com> wrote:
>
>> > (Higher rate of mismatches requiring a second full read? Why would
>> 2i be more likely?)
>>
>> Right, I don't see any reason they should be more likely to actuate
>> read-repair than slice queries are today...
>>
>> Didn't mention this above, but I'd obviously be open to having a
>> system property that switches this behavior.
>>
>> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>>
>>>
>>>
>>> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>> >
>>> > Hello fellow secondary index enjoyers!
>>> >
>>> > If you're familiar with index queries, you probably know that they
>>> are treated as range reads no matter what. This is true even if the user
>>> query restricts results to a single partition. This means that they 
>>> bypass
>>> the digest read process that normal single-partition reads do.
>>>
>>> TIL.
>>>
>>> >
>>> > While I don't think this is something that we need to consider for
>>> 5.0, I would be very interested in the next major release being able to 
>>> use
>>> proper single-partition reads for partition-restricted index queries,
>>> allowing them to take advantage of digest reads. (If single partition 
>>> slice
>>> queries do it, why not index queries?)
>>>
>>> This seems like an obvious yes, so reverse the question - is there
>>> any reason why we WOULDNT want to do this?
>>>
>>> (Higher rate of mismatches requiring a second full read? Why would
>>> 2i be more likely?)
>>>
>>>


Re: Status of CEP-1

2024-10-01 Thread Josh McKenzie
> However it is used by a number of other features as a dependency such as 
> analytics, backup/restore, repair, metrics, and CDC
It seems like a natural pressure relief valve for moving operations out of a 
core C* node that are well served out of process.

On Tue, Oct 1, 2024, at 4:52 PM, Jeremy Hanna wrote:
> 
> The odd thing about the sidecar is that it wasn’t an end in itself. However 
> it is used by a number of other features as a dependency such as analytics, 
> backup/restore, repair, metrics, and CDC.
> 
> I agree with Jeremiah about a 1.0 shippable version. Is there anything else 
> needed in the current sidecar that would hold it back from being that?
> 
>> On Oct 1, 2024, at 12:22 PM, Jeremiah Jordan  
>> wrote:
>> 
>> I don’t really have an opinion on re-writing the existing one vs closing 
>> that and making a new one.
>> But I do think we should have some CEP describing the "1.0 shippable 
>> version" of the side car that is being proposed, then it can have a VOTE 
>> thread, and there will be no issues voting the release meets the CEP once it 
>> is ready.
>> 
>> -Jeremiah
>> 
>> On Oct 1, 2024 at 7:58:41 AM, Josh McKenzie  wrote:
>>> 
 CEP-1 is still completely relevant and we could send an update
>>> CEP-1 feels really fat compared to all our other CEP's. When you need a 
>>> table to enumerate all the subsets of things you're going to do with 
>>> something so you can keep track of progress... it might be too large. :D
>>> 
>>> If we think we can navigate that, I definitely won't stand in the way. But 
>>> given that the people actively working on it aren't the original authors 
>>> and the shepherd's inactive, ISTM a reboot would be cleaner.
>>> 
>>> On Mon, Sep 30, 2024, at 8:36 PM, Dinesh Joshi wrote:
 CEP-1 is still completely relevant and we could send an update but as it 
 stands right now we’ve made a ton of progress and would like to focus on 
 getting to a release so it’s real for the community.
 
 On Mon, Sep 30, 2024 at 5:31 PM Patrick McFadin  wrote:
> There are two easy choices.
> 
> 1 - Re-furbish CEP-1 and start a [DISCUSS] thread
> 2 - Close out CEP-1 and Propose something fresh and start a [DISCUSS] 
> Thread on that.
> 
> Do you think there is enough in CEP-1 to keep moving with or is it 
> completely wrong?
> 
> Patrick
> 
> On Mon, Sep 30, 2024 at 4:53 PM Francisco Guerrero  
> wrote:
>> Hi folks,
>> 
>> I feel I need to update the status of CEP-1 as it currently stands.
>> For context, the Cassandra Sidecar project has had a steady flow of
>> contributions in the past couple of years. And there is a steady stream
>> of upcoming contributions, i.e live migration (CEP-40), CDC (CEP-44),
>> and many others. However, I believe we need to address one issue
>> with CEP-1; and that is its scope.
>> The scope of CEP-1 is too broad, and I would like to propose either
>> closing on CEP-1 or rescoping it. We have a Sidecar now, it's part of
>> the foundation, and AFAIK we've pretty much satisfied the 2 goals of
>> CEP-1 which are listed as "extensible and passes the curl test" and
>> "provides basic but essential and useful functionality".
>> CEP-1 was discussed and consensus was achieved in 2018 after
>> a lot of discussion[4]. CEP-1 contributed to the foundation of the CEP
>> process. Several JIRAs have been opened and active contribution is
>> happening in the subproject.
>> We are getting close to proposing the first release of Sidecar, pending
>> some trivial fixes needed in the configuration and build 
>> processes[1][2][3];
>> as well as CASSANDRASC-141[5] which will bring authn/authz into Sidecar. 
>> Once
>> we close on CASSANDRASC-141, Sidecar will be ready for the 1.0 release.
>> Any new major feature to Sidecar would go through the regular CEP 
>> process.
>> Cassandra’s Sidecar usage is not restricted to the Analytics library, 
>> however
>> it does support this use case at the moment. I will not touch on vnode
>> support in Cassandra Analytics as it deserves its own separate 
>> discussion.
>> We're excited to invite you to a talk on Cassandra Sidecar at the 
>> Community
>> Over Code next week. Join us as we explore the current features and share
>> what’s on the horizon for Sidecar.
>> 
>> Looking forward to hearing your thoughts on this proposal.
>> Best,
>> ⁃ Francisco
>> [1] https://issues.apache.org/jira/browse/CASSANDRASC-120
>> [2] https://issues.apache.org/jira/browse/CASSANDRASC-121
>> [3] https://issues.apache.org/jira/browse/CASSANDRASC-122
>> [4] https://lists.apache.org/thread/xyg8n5hkt7xrfqv48k91tx1jwp0pvcpw
>> [5] https://issues.apache.org/jira/browse/CASSANDRASC-141
>> 
>> 
>>> 


Re: Status of CEP-1

2024-10-01 Thread Jeff Jirsa


> On Oct 1, 2024, at 7:26 PM, Josh McKenzie  wrote:
> 
>> However it is used by a number of other features as a dependency such as 
>> analytics, backup/restore, repair, metrics, and CDC
> It seems like a natural pressure relief valve for moving operations out of a 
> core C* node that are well served out of process.

Yea, but the point of the foundation is to RELEASE software for the public 
good, and the link asserting consensus was dec2018, so its’ 5.5 years and no 
releases.

What’s the plan here? 






Re: Status of CEP-1

2024-10-01 Thread guo Maxwell
Have the same question : what ‘s the plan ?

Jeff Jirsa 于2024年10月2日 周三上午10:43写道:

>
>
> On Oct 1, 2024, at 7:26 PM, Josh McKenzie  wrote:
>
> However it is used by a number of other features as a dependency such as
> analytics, backup/restore, repair, metrics, and CDC
>
> It seems like a natural pressure relief valve for moving operations out of
> a core C* node that are well served out of process.
>
>
> Yea, but the point of the foundation is to RELEASE software for the public
> good, and the link asserting consensus was dec2018, so its’ 5.5 years and
> no releases.
>
> What’s the plan here?
>
>
>
>
>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Berenguer Blasi

Also,

many Jira reports are based off Jira projects. So mixing everything in 
one bag makes those unusable. But more importantly future Jira 
features/integrations/etc will probably be project oriented as well and 
they might become difficult or unusable to us.


My 2cts.

On 1/10/24 23:44, Yifan Cai wrote:
I support the idea of having separate Jira projects. Based on my 
experience with both shared namespaces (like Cassandra and Analytics) 
and dedicated namespaces (like Sidecar), I've seen the drawbacks of 
grouping all subproject tickets under a single project, i.e. Cassandra.


When tickets are consolidated in one project, visibility suffers. For 
instance, tickets must have a prefix in their titles, like in this 
example: https://issues.apache.org/jira/browse/CASSANDRA-19927. It's 
not immediately clear that this ticket pertains to the Analytics 
subproject without clicking the link.


Additionally, using just the Cassandra project leads to project 
metadata—such as "components" and "labels"—that may not apply to other 
subprojects. This can create confusion. In contrast, having distinct 
Jira projects ensures that project-specific metadata is well organized 
and relevant.


On the other hand, the Cassandra Sidecar has its own dedicated Jira 
project, which avoids these issues entirely.


- Yifan

On Tue, Oct 1, 2024 at 7:27 AM Brandon Williams  wrote:

CEP-8 says "We suggest distinct Jira projects, one per driver, all to
be created."

Kind Regards,
Brandon

On Tue, Oct 1, 2024 at 9:23 AM Jon Haddad
 wrote:
>
> My 2 cents - trying to look through C* JIRA right now is kind of
awful with different projects all mixed in.  Given that the
decision to lump everything together seems to have been made
unilaterally, against the VOTE, I'd say we still need to move
drivers off CASSANDRA.
>
> Only question is, one for all drivers or one for each driver?
>
> Jon
>
> On Tue, Oct 1, 2024 at 10:16 AM Brandon Williams
 wrote:
>>
>> What is the status of this thread? Are we looking to move each
driver
>> project to its own jira instance, as voted for in CEP-8?
>>
>> Kind Regards,
>> Brandon
>>
>> On Tue, Apr 9, 2024 at 9:29 AM Brandon Williams
 wrote:
>> >
>> > I am +1 on separate projects as well, but to Abe's point I
don't think
>> > it matters now, we had 21 binding votes for CEP-8 which
spells this
>> > out.
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> > On Tue, Apr 9, 2024 at 9:24 AM Josh McKenzie
 wrote:
>> > >
>> > > +1 to separate JIRA projects per subproject. Having
workflows distinct to each project is reason enough for me,
nevermind the global namespace pollution that occurs if you pack a
bunch of disparate projects together into one instance.
>> > >
>> > > On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
>> > >
>> > > hi folks - sorry to have dropped the ball on responding to
this thread.
>> > >
>> > > My 2 cents are as follows -
>> > >
>> > > 1. Having a separate JIRA project for each sub-project will
add management overhead. This option, however, allows us to model
unique workflows for the sub-project.
>> > >
>> > > 2. Managing the sub-project as part of the Cassandra JIRA
project would imply less management overhead but the sub-project
would need to conform to the same workflows.
>> > >
>> > > I would pick option 1 unless there is a strong reason and
desire to manage a separate Jira project. We can always split out
the Java Driver project if things don't work out. OTOH merging a
Jira project is harder.
>> > >
>> > > Thanks,
>> > >
>> > > Dinesh
>> > >
>> > > On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky 
wrote:
>> > >
>> > > CEP-8 proposes using separate Jira projects per Cassandra
sub-project:
>> > >

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>> > >
>> > > > We suggest distinct Jira projects, one per driver, all to
be created.
>> > >
>> > > I don't see any discussion changing that from the [DISCUSS]
or vote threads:
>> > >
https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
>> > >
https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>> > >
https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
>> > >
>> > > But looks like upon acceptance that was changed:
>> > >
https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
>> > >
>> > > > New issues will be tracked under the CASSANDRA project on
Apache’s JIRA 
under the component ‘Client/java-driver’.
>> > >
>> > > I'm in favor of using the same Jira as Cassandra proper.
Committership is project-wide, so having a stan

Re: Status of CEP-1

2024-10-01 Thread Jeremiah Jordan
 I don’t really have an opinion on re-writing the existing one vs closing
that and making a new one.
But I do think we should have some CEP describing the "1.0 shippable
version" of the side car that is being proposed, then it can have a VOTE
thread, and there will be no issues voting the release meets the CEP once
it is ready.

-Jeremiah

On Oct 1, 2024 at 7:58:41 AM, Josh McKenzie  wrote:

> CEP-1 is still completely relevant and we could send an update
>
> CEP-1 feels really fat compared to all our other CEP's. When you need a
> table to enumerate all the subsets of things you're going to do with
> something so you can keep track of progress... it might be too large. :D
>
> If we think we can navigate that, I definitely won't stand in the way. But
> given that the people actively working on it aren't the original authors
> and the shepherd's inactive, ISTM a reboot would be cleaner.
>
> On Mon, Sep 30, 2024, at 8:36 PM, Dinesh Joshi wrote:
>
> CEP-1 is still completely relevant and we could send an update but as it
> stands right now we’ve made a ton of progress and would like to focus on
> getting to a release so it’s real for the community.
>
> On Mon, Sep 30, 2024 at 5:31 PM Patrick McFadin 
> wrote:
>
> There are two easy choices.
>
> 1 - Re-furbish CEP-1 and start a [DISCUSS] thread
> 2 - Close out CEP-1 and Propose something fresh and start a [DISCUSS]
> Thread on that.
>
> Do you think there is enough in CEP-1 to keep moving with or is it
> completely wrong?
>
> Patrick
>
> On Mon, Sep 30, 2024 at 4:53 PM Francisco Guerrero 
> wrote:
>
> Hi folks,
>
> I feel I need to update the status of CEP-1 as it currently stands.
> For context, the Cassandra Sidecar project has had a steady flow of
> contributions in the past couple of years. And there is a steady stream
> of upcoming contributions, i.e live migration (CEP-40), CDC (CEP-44),
> and many others. However, I believe we need to address one issue
> with CEP-1; and that is its scope.
> The scope of CEP-1 is too broad, and I would like to propose either
> closing on CEP-1 or rescoping it. We have a Sidecar now, it's part of
> the foundation, and AFAIK we've pretty much satisfied the 2 goals of
> CEP-1 which are listed as "extensible and passes the curl test" and
> "provides basic but essential and useful functionality".
> CEP-1 was discussed and consensus was achieved in 2018 after
> a lot of discussion[4]. CEP-1 contributed to the foundation of the CEP
> process. Several JIRAs have been opened and active contribution is
> happening in the subproject.
> We are getting close to proposing the first release of Sidecar, pending
> some trivial fixes needed in the configuration and build
> processes[1][2][3];
> as well as CASSANDRASC-141[5] which will bring authn/authz into Sidecar.
> Once
> we close on CASSANDRASC-141, Sidecar will be ready for the 1.0 release.
> Any new major feature to Sidecar would go through the regular CEP process.
> Cassandra’s Sidecar usage is not restricted to the Analytics library,
> however
> it does support this use case at the moment. I will not touch on vnode
> support in Cassandra Analytics as it deserves its own separate discussion.
> We're excited to invite you to a talk on Cassandra Sidecar at the Community
> Over Code next week. Join us as we explore the current features and share
> what’s on the horizon for Sidecar.
>
> Looking forward to hearing your thoughts on this proposal.
> Best,
> ⁃ Francisco
> [1] https://issues.apache.org/jira/browse/CASSANDRASC-120
> [2] https://issues.apache.org/jira/browse/CASSANDRASC-121
> [3] https://issues.apache.org/jira/browse/CASSANDRASC-122
> [4] https://lists.apache.org/thread/xyg8n5hkt7xrfqv48k91tx1jwp0pvcpw
> [5] https://issues.apache.org/jira/browse/CASSANDRASC-141
>
>
>
>


[DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Caleb Rackliffe
Hello fellow secondary index enjoyers!

If you're familiar with index queries, you probably know that they are
treated as range reads no matter what. This is true even if the user query
restricts results to a single partition. This means that they bypass the
digest read process that normal single-partition reads do.

While I don't think this is something that we need to consider for 5.0, I
would be very interested in the next major release being able to use proper
single-partition reads for partition-restricted index queries, allowing
them to take advantage of digest reads. (If single partition slice queries
do it, why not index queries?)

To that end, I've come up w/ a patch here
 with CI results.
(Better to have something concrete to talk about, and to illustrate that
legacy 2i, SASI, and SAI didn't require much effort to fix.)

What do you think?


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jeff Jirsa



> On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe  wrote:
> 
> Hello fellow secondary index enjoyers!
> 
> If you're familiar with index queries, you probably know that they are 
> treated as range reads no matter what. This is true even if the user query 
> restricts results to a single partition. This means that they bypass the 
> digest read process that normal single-partition reads do.

TIL.

> 
> While I don't think this is something that we need to consider for 5.0, I 
> would be very interested in the next major release being able to use proper 
> single-partition reads for partition-restricted index queries, allowing them 
> to take advantage of digest reads. (If single partition slice queries do it, 
> why not index queries?)

This seems like an obvious yes, so reverse the question - is there any reason 
why we WOULDNT want to do this? 

(Higher rate of mismatches requiring a second full read? Why would 2i be more 
likely?)



Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Jon Haddad
My 2 cents - trying to look through C* JIRA right now is kind of awful with
different projects all mixed in.  Given that the decision to lump
everything together seems to have been made unilaterally, against the VOTE,
I'd say we still need to move drivers off CASSANDRA.

Only question is, one for all drivers or one for each driver?

Jon

On Tue, Oct 1, 2024 at 10:16 AM Brandon Williams  wrote:

> What is the status of this thread? Are we looking to move each driver
> project to its own jira instance, as voted for in CEP-8?
>
> Kind Regards,
> Brandon
>
> On Tue, Apr 9, 2024 at 9:29 AM Brandon Williams  wrote:
> >
> > I am +1 on separate projects as well, but to Abe's point I don't think
> > it matters now, we had 21 binding votes for CEP-8 which spells this
> > out.
> >
> > Kind Regards,
> > Brandon
> >
> > On Tue, Apr 9, 2024 at 9:24 AM Josh McKenzie 
> wrote:
> > >
> > > +1 to separate JIRA projects per subproject. Having workflows distinct
> to each project is reason enough for me, nevermind the global namespace
> pollution that occurs if you pack a bunch of disparate projects together
> into one instance.
> > >
> > > On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
> > >
> > > hi folks - sorry to have dropped the ball on responding to this thread.
> > >
> > > My 2 cents are as follows -
> > >
> > > 1. Having a separate JIRA project for each sub-project will add
> management overhead. This option, however, allows us to model unique
> workflows for the sub-project.
> > >
> > > 2. Managing the sub-project as part of the Cassandra JIRA project
> would imply less management overhead but the sub-project would need to
> conform to the same workflows.
> > >
> > > I would pick option 1 unless there is a strong reason and desire to
> manage a separate Jira project. We can always split out the Java Driver
> project if things don't work out. OTOH merging a Jira project is harder.
> > >
> > > Thanks,
> > >
> > > Dinesh
> > >
> > > On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
> > >
> > > CEP-8 proposes using separate Jira projects per Cassandra sub-project:
> > >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> > >
> > > > We suggest distinct Jira projects, one per driver, all to be created.
> > >
> > > I don't see any discussion changing that from the [DISCUSS] or vote
> threads:
> > > https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
> > > https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
> > > https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
> > >
> > > But looks like upon acceptance that was changed:
> > > https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
> > >
> > > > New issues will be tracked under the CASSANDRA project on Apache’s
> JIRA  under the
> component ‘Client/java-driver’.
> > >
> > > I'm in favor of using the same Jira as Cassandra proper. Committership
> is project-wide, so having a standardized process (same ticket flow, review
> rules, labels, etc. is beneficial). But multiple votes happened based on
> the content of the CEP, so we should stick to what was voted on and move to
> a separate Jira.
> > >
> > > --
> > > Abe
> > >
> > >
>


Re: Status of CEP-1

2024-10-01 Thread Josh McKenzie
> CEP-1 is still completely relevant and we could send an update
CEP-1 feels really fat compared to all our other CEP's. When you need a table 
to enumerate all the subsets of things you're going to do with something so you 
can keep track of progress... it might be too large. :D

If we think we can navigate that, I definitely won't stand in the way. But 
given that the people actively working on it aren't the original authors and 
the shepherd's inactive, ISTM a reboot would be cleaner.

On Mon, Sep 30, 2024, at 8:36 PM, Dinesh Joshi wrote:
> CEP-1 is still completely relevant and we could send an update but as it 
> stands right now we’ve made a ton of progress and would like to focus on 
> getting to a release so it’s real for the community.
> 
> On Mon, Sep 30, 2024 at 5:31 PM Patrick McFadin  wrote:
>> There are two easy choices.
>> 
>> 1 - Re-furbish CEP-1 and start a [DISCUSS] thread
>> 2 - Close out CEP-1 and Propose something fresh and start a [DISCUSS] Thread 
>> on that.
>> 
>> Do you think there is enough in CEP-1 to keep moving with or is it 
>> completely wrong?
>> 
>> Patrick
>> 
>> On Mon, Sep 30, 2024 at 4:53 PM Francisco Guerrero  
>> wrote:
>>> Hi folks,
>>> 
>>> I feel I need to update the status of CEP-1 as it currently stands.
>>> For context, the Cassandra Sidecar project has had a steady flow of
>>> contributions in the past couple of years. And there is a steady stream
>>> of upcoming contributions, i.e live migration (CEP-40), CDC (CEP-44),
>>> and many others. However, I believe we need to address one issue
>>> with CEP-1; and that is its scope.
>>> The scope of CEP-1 is too broad, and I would like to propose either
>>> closing on CEP-1 or rescoping it. We have a Sidecar now, it's part of
>>> the foundation, and AFAIK we've pretty much satisfied the 2 goals of
>>> CEP-1 which are listed as "extensible and passes the curl test" and
>>> "provides basic but essential and useful functionality".
>>> CEP-1 was discussed and consensus was achieved in 2018 after
>>> a lot of discussion[4]. CEP-1 contributed to the foundation of the CEP
>>> process. Several JIRAs have been opened and active contribution is
>>> happening in the subproject.
>>> We are getting close to proposing the first release of Sidecar, pending
>>> some trivial fixes needed in the configuration and build processes[1][2][3];
>>> as well as CASSANDRASC-141[5] which will bring authn/authz into Sidecar. 
>>> Once
>>> we close on CASSANDRASC-141, Sidecar will be ready for the 1.0 release.
>>> Any new major feature to Sidecar would go through the regular CEP process.
>>> Cassandra’s Sidecar usage is not restricted to the Analytics library, 
>>> however
>>> it does support this use case at the moment. I will not touch on vnode
>>> support in Cassandra Analytics as it deserves its own separate discussion.
>>> We're excited to invite you to a talk on Cassandra Sidecar at the Community
>>> Over Code next week. Join us as we explore the current features and share
>>> what’s on the horizon for Sidecar.
>>> 
>>> Looking forward to hearing your thoughts on this proposal.
>>> Best,
>>> ⁃ Francisco
>>> [1] https://issues.apache.org/jira/browse/CASSANDRASC-120
>>> [2] https://issues.apache.org/jira/browse/CASSANDRASC-121
>>> [3] https://issues.apache.org/jira/browse/CASSANDRASC-122
>>> [4] https://lists.apache.org/thread/xyg8n5hkt7xrfqv48k91tx1jwp0pvcpw
>>> [5] https://issues.apache.org/jira/browse/CASSANDRASC-141
>>> 
>>> 


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Brandon Williams
CEP-8 says "We suggest distinct Jira projects, one per driver, all to
be created."

Kind Regards,
Brandon

On Tue, Oct 1, 2024 at 9:23 AM Jon Haddad  wrote:
>
> My 2 cents - trying to look through C* JIRA right now is kind of awful with 
> different projects all mixed in.  Given that the decision to lump everything 
> together seems to have been made unilaterally, against the VOTE, I'd say we 
> still need to move drivers off CASSANDRA.
>
> Only question is, one for all drivers or one for each driver?
>
> Jon
>
> On Tue, Oct 1, 2024 at 10:16 AM Brandon Williams  wrote:
>>
>> What is the status of this thread? Are we looking to move each driver
>> project to its own jira instance, as voted for in CEP-8?
>>
>> Kind Regards,
>> Brandon
>>
>> On Tue, Apr 9, 2024 at 9:29 AM Brandon Williams  wrote:
>> >
>> > I am +1 on separate projects as well, but to Abe's point I don't think
>> > it matters now, we had 21 binding votes for CEP-8 which spells this
>> > out.
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> > On Tue, Apr 9, 2024 at 9:24 AM Josh McKenzie  wrote:
>> > >
>> > > +1 to separate JIRA projects per subproject. Having workflows distinct 
>> > > to each project is reason enough for me, nevermind the global namespace 
>> > > pollution that occurs if you pack a bunch of disparate projects together 
>> > > into one instance.
>> > >
>> > > On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
>> > >
>> > > hi folks - sorry to have dropped the ball on responding to this thread.
>> > >
>> > > My 2 cents are as follows -
>> > >
>> > > 1. Having a separate JIRA project for each sub-project will add 
>> > > management overhead. This option, however, allows us to model unique 
>> > > workflows for the sub-project.
>> > >
>> > > 2. Managing the sub-project as part of the Cassandra JIRA project would 
>> > > imply less management overhead but the sub-project would need to conform 
>> > > to the same workflows.
>> > >
>> > > I would pick option 1 unless there is a strong reason and desire to 
>> > > manage a separate Jira project. We can always split out the Java Driver 
>> > > project if things don't work out. OTOH merging a Jira project is harder.
>> > >
>> > > Thanks,
>> > >
>> > > Dinesh
>> > >
>> > > On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
>> > >
>> > > CEP-8 proposes using separate Jira projects per Cassandra sub-project:
>> > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>> > >
>> > > > We suggest distinct Jira projects, one per driver, all to be created.
>> > >
>> > > I don't see any discussion changing that from the [DISCUSS] or vote 
>> > > threads:
>> > > https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
>> > > https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>> > > https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
>> > >
>> > > But looks like upon acceptance that was changed:
>> > > https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
>> > >
>> > > > New issues will be tracked under the CASSANDRA project on Apache’s 
>> > > > JIRA  under the 
>> > > > component ‘Client/java-driver’.
>> > >
>> > > I'm in favor of using the same Jira as Cassandra proper. Committership 
>> > > is project-wide, so having a standardized process (same ticket flow, 
>> > > review rules, labels, etc. is beneficial). But multiple votes happened 
>> > > based on the content of the CEP, so we should stick to what was voted on 
>> > > and move to a separate Jira.
>> > >
>> > > --
>> > > Abe
>> > >
>> > >


Re: DefaultDriverOption default values

2024-10-01 Thread Abe Ratnofsky
The user list is a more appropriate place for this kind of question.

Here’s the default reference.conf configuration file: 
https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/resources/reference.conf

Docs on configuration here: 
https://github.com/apache/cassandra-java-driver/tree/4.x/manual/core/configuration

> On Oct 1, 2024, at 06:02, Sébastien Rebecchi  wrote:
> 
> 
> Hello
> 
> I am using Datastax Java driver for Apache Cassandra, and programmatic 
> session builder.
> I can't find what are the default values of configs listed in 
> DefaultDriverOption 
> (https://docs.datastax.com/en/drivers/java/4.13/com/datastax/oss/driver/api/core/config/DefaultDriverOption.html),
>  for example what is the default value of 
> "advanced.connection.connect-timeout", i.e. when CONNECTION_CONNECT_TIMEOUT 
> is not explicitly set on programmatic session builder.
> I am not sure if I am asking on the right mailing lists, any help would be 
> appreciated.
> 
> Best regards,
> Sébastien.
> 


Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-01 Thread James Berragan
It seems this has triggered some important discussions about CEP-1 and the
Sidecar. Let's keep those in their respective threads and focus this
conversation on CEP-44.

Patrick, I think I missed your point "There is also little mention of where
the increased resource load would be handled." - you're right, running CDC
in the Sidecar implicitly means it uses additional resources in the C*
cluster. This resource usage is proportional to the write throughput, so
it's not suitable for use cases with very high write throughput, but our
experience has been that for standard mixed workloads the overhead is
minimal. The throttling built in safely handles burst workloads.

James.

On Mon, 30 Sept 2024 at 14:22, Josh McKenzie  wrote:

> This is the type of hidden subproject that will get us into trouble with
> the board/foundation.   I'm sure it's getting enough committer eyeballs,
> and some PMC oversight, but maybe not enough.
>
> I don't agree with the qualifier of it as being hidden. It's definitely
> lower traffic than the main project but there's movement on the JIRA here:
> link
> 
> .
>
> I assume the sidecar is going to take longer to reach a tipping point
> where more people start contributing to it until it has compelling features
> that'll incentivize folks running their own bespoke sidecars to migrate
> over.
>
> Agree with all your points Jon; there's a lot to be done on it.
>
> CEP-1 is pretty much abandoned yeah. I think it'd be reasonable to close
> it down and open up a new one w/active contributors + active shepherd and a
> much more limited scope.
>
> On Mon, Sep 30, 2024, at 2:13 PM, Patrick McFadin wrote:
>
> I'm mentioning it because I was surprised and I feel like I generally have
> a finger on the pulse of the project.
>
> I would love to talk about it more and get more community support and
> interest.
>
> On Mon, Sep 30, 2024 at 11:01 AM Mick Semb Wever  wrote:
>
> Agree with Jon, Josh and Patrick here.
>
> This is the type of hidden subproject that will get us into trouble with
> the board/foundation.   I'm sure it's getting enough committer eyeballs,
> and some PMC oversight, but maybe not enough.  Addressing the more material
> points that Jon mentions is the best way to deal with that IMHO.
>
>
>
> On Mon, 30 Sept 2024 at 20:37, Jon Haddad  wrote:
>
> I think it depends on what lens you're looking at the sidecar through.
>
> If you're actively working on it, and pulling it into your own infra,
> sure.  It's a thing.
>
> If you're an outsider?  I have a hard time seeing it.
>
> - No documentation as to what it does
> - No releases
> - No build instructions
> - Trying to build using standard gradle commands fails [1]
> - Included configs don't work out of the box. [2][3]
> - CEP-1 looks abandonded
> - The primary reason right now to use it looks to be analytics library,
> which doesn't work for most teams due to lack of vnode support [4]
>
> I think if you were to take a poll of 100 users outside this ML, I'd bet
> almost every one would agree the sidecar isn't a thing yet, and that's
> probably more important than if it's actually getting worked on.  I think
> it has quite a ways to go before it looks to be more than an idea that's
> incubating.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRASC-120
> [2 https://issues.apache.org/jira/browse/CASSANDRASC-121
> [3] https://issues.apache.org/jira/browse/CASSANDRASC-122
> [4] https://issues.apache.org/jira/browse/CASSANDRA-19594
>
>
> On Mon, Sep 30, 2024 at 11:14 AM Josh McKenzie 
> wrote:
>
>
> The CEP for the sidecar has stalled. The sidecar itself is very much alive
> and a thing.
>
> CEP != artifact.
>
> We should definitely clean that up though.
>
> On Mon, Sep 30, 2024, at 10:59 AM, Dinesh Joshi wrote:
>
> Patrick, could you please elaborate? The Sidecar has been a thing for a
> while now.
>
> On Mon, Sep 30, 2024 at 7:51 AM Patrick McFadin 
> wrote:
>
> I made the mistake of asking two things in one email.
>
> First thing I asked. Sidecar? Stalled CEP so why is this being talked
> about like this is a thing?
>
> On Mon, Sep 30, 2024 at 7:21 AM Benedict  wrote:
>
>
> Sorry Bernardo, you may have misunderstood me. I don’t have any concerns,
> I was suggesting a possible future scenario where CDC for Kafka via sidecar
> is changed to use a hypothetical future topic subscription service provided
> by C*. It was meant to show that this CEP may be easily decoupled from any
> future evolution in this area.
>
>
> On 30 Sep 2024, at 14:58, Bernardo Botella 
> wrote:
>
> Thanks everyone for the comments.
>
>
> Patrick:
> The proposal includes a “best effort” approach for deduplication (some
> details can be found on the Digest class comments on the PR here
> https://github.com/apache/cassandra-analytics/pull/87/files#diff-3a09caecc1da13419d92cde56a7cfc7d253faac08182e6c2768b3d32c015de82R185-R193
> 

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Brandon Williams
What is the status of this thread? Are we looking to move each driver
project to its own jira instance, as voted for in CEP-8?

Kind Regards,
Brandon

On Tue, Apr 9, 2024 at 9:29 AM Brandon Williams  wrote:
>
> I am +1 on separate projects as well, but to Abe's point I don't think
> it matters now, we had 21 binding votes for CEP-8 which spells this
> out.
>
> Kind Regards,
> Brandon
>
> On Tue, Apr 9, 2024 at 9:24 AM Josh McKenzie  wrote:
> >
> > +1 to separate JIRA projects per subproject. Having workflows distinct to 
> > each project is reason enough for me, nevermind the global namespace 
> > pollution that occurs if you pack a bunch of disparate projects together 
> > into one instance.
> >
> > On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
> >
> > hi folks - sorry to have dropped the ball on responding to this thread.
> >
> > My 2 cents are as follows -
> >
> > 1. Having a separate JIRA project for each sub-project will add management 
> > overhead. This option, however, allows us to model unique workflows for the 
> > sub-project.
> >
> > 2. Managing the sub-project as part of the Cassandra JIRA project would 
> > imply less management overhead but the sub-project would need to conform to 
> > the same workflows.
> >
> > I would pick option 1 unless there is a strong reason and desire to manage 
> > a separate Jira project. We can always split out the Java Driver project if 
> > things don't work out. OTOH merging a Jira project is harder.
> >
> > Thanks,
> >
> > Dinesh
> >
> > On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
> >
> > CEP-8 proposes using separate Jira projects per Cassandra sub-project:
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> >
> > > We suggest distinct Jira projects, one per driver, all to be created.
> >
> > I don't see any discussion changing that from the [DISCUSS] or vote threads:
> > https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
> > https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
> > https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
> >
> > But looks like upon acceptance that was changed:
> > https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
> >
> > > New issues will be tracked under the CASSANDRA project on Apache’s JIRA 
> > >  under the component 
> > > ‘Client/java-driver’.
> >
> > I'm in favor of using the same Jira as Cassandra proper. Committership is 
> > project-wide, so having a standardized process (same ticket flow, review 
> > rules, labels, etc. is beneficial). But multiple votes happened based on 
> > the content of the CEP, so we should stick to what was voted on and move to 
> > a separate Jira.
> >
> > --
> > Abe
> >
> >


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jon Haddad
This seems like it's strictly a win.  Doesn't sound to me like a flag is
needed.

On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe 
wrote:

> > (Higher rate of mismatches requiring a second full read? Why would 2i be
> more likely?)
>
> Right, I don't see any reason they should be more likely to actuate
> read-repair than slice queries are today...
>
> Didn't mention this above, but I'd obviously be open to having a system
> property that switches this behavior.
>
> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>
>>
>>
>> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe 
>> wrote:
>> >
>> > Hello fellow secondary index enjoyers!
>> >
>> > If you're familiar with index queries, you probably know that they are
>> treated as range reads no matter what. This is true even if the user query
>> restricts results to a single partition. This means that they bypass the
>> digest read process that normal single-partition reads do.
>>
>> TIL.
>>
>> >
>> > While I don't think this is something that we need to consider for 5.0,
>> I would be very interested in the next major release being able to use
>> proper single-partition reads for partition-restricted index queries,
>> allowing them to take advantage of digest reads. (If single partition slice
>> queries do it, why not index queries?)
>>
>> This seems like an obvious yes, so reverse the question - is there any
>> reason why we WOULDNT want to do this?
>>
>> (Higher rate of mismatches requiring a second full read? Why would 2i be
>> more likely?)
>>
>>


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Caleb Rackliffe
> (Higher rate of mismatches requiring a second full read? Why would 2i be
more likely?)

Right, I don't see any reason they should be more likely to actuate
read-repair than slice queries are today...

Didn't mention this above, but I'd obviously be open to having a system
property that switches this behavior.

On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:

>
>
> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe 
> wrote:
> >
> > Hello fellow secondary index enjoyers!
> >
> > If you're familiar with index queries, you probably know that they are
> treated as range reads no matter what. This is true even if the user query
> restricts results to a single partition. This means that they bypass the
> digest read process that normal single-partition reads do.
>
> TIL.
>
> >
> > While I don't think this is something that we need to consider for 5.0,
> I would be very interested in the next major release being able to use
> proper single-partition reads for partition-restricted index queries,
> allowing them to take advantage of digest reads. (If single partition slice
> queries do it, why not index queries?)
>
> This seems like an obvious yes, so reverse the question - is there any
> reason why we WOULDNT want to do this?
>
> (Higher rate of mismatches requiring a second full read? Why would 2i be
> more likely?)
>
>


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Caleb Rackliffe
We did add CASSANDRA-18940
 to make sure local
SAI post-filtering reads got picked up somewhere, but you're right that
StorageProxy#readRegular() would start recording some index queries in the
normal read metrics.

On Tue, Oct 1, 2024 at 2:11 PM Jeremiah Jordan 
wrote:

> Did we add new metrics for index queries?  The only issue I see is that
> this change will mix index queries into the regular read metrics, where
> before they were in the range metrics, so maybe some changes to metrics
> should go with it.  But I think this is a good change over all.
>
> On Oct 1, 2024 at 1:51:10 PM, Jon Haddad  wrote:
>
>> This seems like it's strictly a win.  Doesn't sound to me like a flag is
>> needed.
>>
>> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe 
>> wrote:
>>
>>> > (Higher rate of mismatches requiring a second full read? Why would 2i
>>> be more likely?)
>>>
>>> Right, I don't see any reason they should be more likely to actuate
>>> read-repair than slice queries are today...
>>>
>>> Didn't mention this above, but I'd obviously be open to having a system
>>> property that switches this behavior.
>>>
>>> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>>>


 > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe <
 calebrackli...@gmail.com> wrote:
 >
 > Hello fellow secondary index enjoyers!
 >
 > If you're familiar with index queries, you probably know that they
 are treated as range reads no matter what. This is true even if the user
 query restricts results to a single partition. This means that they bypass
 the digest read process that normal single-partition reads do.

 TIL.

 >
 > While I don't think this is something that we need to consider for
 5.0, I would be very interested in the next major release being able to use
 proper single-partition reads for partition-restricted index queries,
 allowing them to take advantage of digest reads. (If single partition slice
 queries do it, why not index queries?)

 This seems like an obvious yes, so reverse the question - is there any
 reason why we WOULDNT want to do this?

 (Higher rate of mismatches requiring a second full read? Why would 2i
 be more likely?)




Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jeremiah Jordan
 Did we add new metrics for index queries?  The only issue I see is that
this change will mix index queries into the regular read metrics, where
before they were in the range metrics, so maybe some changes to metrics
should go with it.  But I think this is a good change over all.

On Oct 1, 2024 at 1:51:10 PM, Jon Haddad  wrote:

> This seems like it's strictly a win.  Doesn't sound to me like a flag is
> needed.
>
> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe 
> wrote:
>
>> > (Higher rate of mismatches requiring a second full read? Why would 2i
>> be more likely?)
>>
>> Right, I don't see any reason they should be more likely to actuate
>> read-repair than slice queries are today...
>>
>> Didn't mention this above, but I'd obviously be open to having a system
>> property that switches this behavior.
>>
>> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>>
>>>
>>>
>>> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe 
>>> wrote:
>>> >
>>> > Hello fellow secondary index enjoyers!
>>> >
>>> > If you're familiar with index queries, you probably know that they are
>>> treated as range reads no matter what. This is true even if the user query
>>> restricts results to a single partition. This means that they bypass the
>>> digest read process that normal single-partition reads do.
>>>
>>> TIL.
>>>
>>> >
>>> > While I don't think this is something that we need to consider for
>>> 5.0, I would be very interested in the next major release being able to use
>>> proper single-partition reads for partition-restricted index queries,
>>> allowing them to take advantage of digest reads. (If single partition slice
>>> queries do it, why not index queries?)
>>>
>>> This seems like an obvious yes, so reverse the question - is there any
>>> reason why we WOULDNT want to do this?
>>>
>>> (Higher rate of mismatches requiring a second full read? Why would 2i be
>>> more likely?)
>>>
>>>


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Caleb Rackliffe
Alrighty, with what looks like a fair amount of support, I'll declare
CASSANDRA-19968  ready
for some preliminary review.

On Tue, Oct 1, 2024 at 2:41 PM Caleb Rackliffe 
wrote:

> We did add CASSANDRA-18940
>  to make sure
> local SAI post-filtering reads got picked up somewhere, but you're right
> that StorageProxy#readRegular() would start recording some index queries
> in the normal read metrics.
>
> On Tue, Oct 1, 2024 at 2:11 PM Jeremiah Jordan 
> wrote:
>
>> Did we add new metrics for index queries?  The only issue I see is that
>> this change will mix index queries into the regular read metrics, where
>> before they were in the range metrics, so maybe some changes to metrics
>> should go with it.  But I think this is a good change over all.
>>
>> On Oct 1, 2024 at 1:51:10 PM, Jon Haddad  wrote:
>>
>>> This seems like it's strictly a win.  Doesn't sound to me like a flag is
>>> needed.
>>>
>>> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe 
>>> wrote:
>>>
 > (Higher rate of mismatches requiring a second full read? Why would 2i
 be more likely?)

 Right, I don't see any reason they should be more likely to actuate
 read-repair than slice queries are today...

 Didn't mention this above, but I'd obviously be open to having a system
 property that switches this behavior.

 On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:

>
>
> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe <
> calebrackli...@gmail.com> wrote:
> >
> > Hello fellow secondary index enjoyers!
> >
> > If you're familiar with index queries, you probably know that they
> are treated as range reads no matter what. This is true even if the user
> query restricts results to a single partition. This means that they bypass
> the digest read process that normal single-partition reads do.
>
> TIL.
>
> >
> > While I don't think this is something that we need to consider for
> 5.0, I would be very interested in the next major release being able to 
> use
> proper single-partition reads for partition-restricted index queries,
> allowing them to take advantage of digest reads. (If single partition 
> slice
> queries do it, why not index queries?)
>
> This seems like an obvious yes, so reverse the question - is there any
> reason why we WOULDNT want to do this?
>
> (Higher rate of mismatches requiring a second full read? Why would 2i
> be more likely?)
>
>


DefaultDriverOption default values

2024-10-01 Thread Sébastien Rebecchi
Hello

I am using Datastax Java driver for Apache Cassandra, and programmatic
session builder.
I can't find what are the default values of configs listed
in DefaultDriverOption (
https://docs.datastax.com/en/drivers/java/4.13/com/datastax/oss/driver/api/core/config/DefaultDriverOption.html),
for example what is the default value
of "advanced.connection.connect-timeout", i.e.
when CONNECTION_CONNECT_TIMEOUT is not explicitly set on programmatic
session builder.
I am not sure if I am asking on the right mailing lists, any help would be
appreciated.

Best regards,
Sébastien.


Re: Status of CEP-1

2024-10-01 Thread Jeremy Hanna
The odd thing about the sidecar is that it wasn’t an end in itself. However it is used by a number of other features as a dependency such as analytics, backup/restore, repair, metrics, and CDC.I agree with Jeremiah about a 1.0 shippable version. Is there anything else needed in the current sidecar that would hold it back from being that?On Oct 1, 2024, at 12:22 PM, Jeremiah Jordan  wrote:
I don’t really have an opinion on re-writing the existing one vs closing that and making a new one.But I do think we should have some CEP describing the "1.0 shippable version" of the side car that is being proposed, then it can have a VOTE thread, and there will be no issues voting the release meets the CEP once it is ready.-Jeremiah


On Oct 1, 2024 at 7:58:41 AM, Josh McKenzie  wrote:

CEP-1 is still completely relevant and we could send an updateCEP-1 feels really fat compared to all our other CEP's. When you need a table to enumerate all the subsets of things you're going to do with something so you can keep track of progress... it might be too large. :DIf we think we can navigate that, I definitely won't stand in the way. But given that the people actively working on it aren't the original authors and the shepherd's inactive, ISTM a reboot would be cleaner.On Mon, Sep 30, 2024, at 8:36 PM, Dinesh Joshi wrote:CEP-1 is still completely relevant and we could send an update but as it stands right now we’ve made a ton of progress and would like to focus on getting to a release so it’s real for the community.On Mon, Sep 30, 2024 at 5:31 PM Patrick McFadin  wrote:There are two easy choices.1 - Re-furbish CEP-1 and start a [DISCUSS] thread2 - Close out CEP-1 and Propose something fresh and start a [DISCUSS] Thread on that.Do you think there is enough in CEP-1 to keep moving with or is it completely wrong?PatrickOn Mon, Sep 30, 2024 at 4:53 PM Francisco Guerrero  wrote:Hi folks,I feel I need to update the status of CEP-1 as it currently stands.For context, the Cassandra Sidecar project has had a steady flow ofcontributions in the past couple of years. And there is a steady streamof upcoming contributions, i.e live migration (CEP-40), CDC (CEP-44),and many others. However, I believe we need to address one issuewith CEP-1; and that is its scope.The scope of CEP-1 is too broad, and I would like to propose eitherclosing on CEP-1 or rescoping it. We have a Sidecar now, it's part ofthe foundation, and AFAIK we've pretty much satisfied the 2 goals ofCEP-1 which are listed as "extensible and passes the curl test" and"provides basic but essential and useful functionality".CEP-1 was discussed and consensus was achieved in 2018 aftera lot of discussion[4]. CEP-1 contributed to the foundation of the CEPprocess. Several JIRAs have been opened and active contribution ishappening in the subproject.We are getting close to proposing the first release of Sidecar, pendingsome trivial fixes needed in the configuration and build processes[1][2][3];as well as CASSANDRASC-141[5] which will bring authn/authz into Sidecar. Oncewe close on CASSANDRASC-141, Sidecar will be ready for the 1.0 release.Any new major feature to Sidecar would go through the regular CEP process.Cassandra’s Sidecar usage is not restricted to the Analytics library, howeverit does support this use case at the moment. I will not touch on vnodesupport in Cassandra Analytics as it deserves its own separate discussion.We're excited to invite you to a talk on Cassandra Sidecar at the CommunityOver Code next week. Join us as we explore the current features and sharewhat’s on the horizon for Sidecar.Looking forward to hearing your thoughts on this proposal.Best,⁃ Francisco[1] https://issues.apache.org/jira/browse/CASSANDRASC-120[2] https://issues.apache.org/jira/browse/CASSANDRASC-121[3] https://issues.apache.org/jira/browse/CASSANDRASC-122[4] https://lists.apache.org/thread/xyg8n5hkt7xrfqv48k91tx1jwp0pvcpw[5] https://issues.apache.org/jira/browse/CASSANDRASC-141




Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Yifan Cai
I support the idea of having separate Jira projects. Based on my experience
with both shared namespaces (like Cassandra and Analytics) and dedicated
namespaces (like Sidecar), I've seen the drawbacks of grouping all
subproject tickets under a single project, i.e. Cassandra.

When tickets are consolidated in one project, visibility suffers. For
instance, tickets must have a prefix in their titles, like in this example:
https://issues.apache.org/jira/browse/CASSANDRA-19927. It's not immediately
clear that this ticket pertains to the Analytics subproject without
clicking the link.

Additionally, using just the Cassandra project leads to project
metadata—such as "components" and "labels"—that may not apply to other
subprojects. This can create confusion. In contrast, having distinct Jira
projects ensures that project-specific metadata is well organized and
relevant.

On the other hand, the Cassandra Sidecar has its own dedicated Jira
project, which avoids these issues entirely.

- Yifan

On Tue, Oct 1, 2024 at 7:27 AM Brandon Williams  wrote:

> CEP-8 says "We suggest distinct Jira projects, one per driver, all to
> be created."
>
> Kind Regards,
> Brandon
>
> On Tue, Oct 1, 2024 at 9:23 AM Jon Haddad  wrote:
> >
> > My 2 cents - trying to look through C* JIRA right now is kind of awful
> with different projects all mixed in.  Given that the decision to lump
> everything together seems to have been made unilaterally, against the VOTE,
> I'd say we still need to move drivers off CASSANDRA.
> >
> > Only question is, one for all drivers or one for each driver?
> >
> > Jon
> >
> > On Tue, Oct 1, 2024 at 10:16 AM Brandon Williams 
> wrote:
> >>
> >> What is the status of this thread? Are we looking to move each driver
> >> project to its own jira instance, as voted for in CEP-8?
> >>
> >> Kind Regards,
> >> Brandon
> >>
> >> On Tue, Apr 9, 2024 at 9:29 AM Brandon Williams 
> wrote:
> >> >
> >> > I am +1 on separate projects as well, but to Abe's point I don't think
> >> > it matters now, we had 21 binding votes for CEP-8 which spells this
> >> > out.
> >> >
> >> > Kind Regards,
> >> > Brandon
> >> >
> >> > On Tue, Apr 9, 2024 at 9:24 AM Josh McKenzie 
> wrote:
> >> > >
> >> > > +1 to separate JIRA projects per subproject. Having workflows
> distinct to each project is reason enough for me, nevermind the global
> namespace pollution that occurs if you pack a bunch of disparate projects
> together into one instance.
> >> > >
> >> > > On Mon, Apr 8, 2024, at 9:11 PM, Dinesh Joshi wrote:
> >> > >
> >> > > hi folks - sorry to have dropped the ball on responding to this
> thread.
> >> > >
> >> > > My 2 cents are as follows -
> >> > >
> >> > > 1. Having a separate JIRA project for each sub-project will add
> management overhead. This option, however, allows us to model unique
> workflows for the sub-project.
> >> > >
> >> > > 2. Managing the sub-project as part of the Cassandra JIRA project
> would imply less management overhead but the sub-project would need to
> conform to the same workflows.
> >> > >
> >> > > I would pick option 1 unless there is a strong reason and desire to
> manage a separate Jira project. We can always split out the Java Driver
> project if things don't work out. OTOH merging a Jira project is harder.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Dinesh
> >> > >
> >> > > On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
> >> > >
> >> > > CEP-8 proposes using separate Jira projects per Cassandra
> sub-project:
> >> > >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> >> > >
> >> > > > We suggest distinct Jira projects, one per driver, all to be
> created.
> >> > >
> >> > > I don't see any discussion changing that from the [DISCUSS] or vote
> threads:
> >> > > https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
> >> > > https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
> >> > > https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
> >> > >
> >> > > But looks like upon acceptance that was changed:
> >> > > https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
> >> > >
> >> > > > New issues will be tracked under the CASSANDRA project on
> Apache’s JIRA  under
> the component ‘Client/java-driver’.
> >> > >
> >> > > I'm in favor of using the same Jira as Cassandra proper.
> Committership is project-wide, so having a standardized process (same
> ticket flow, review rules, labels, etc. is beneficial). But multiple votes
> happened based on the content of the CEP, so we should stick to what was
> voted on and move to a separate Jira.
> >> > >
> >> > > --
> >> > > Abe
> >> > >
> >> > >
>


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Caleb Rackliffe
It's certainly an improvement/optimization, so I wouldn't object to it
being in 5.0.x. I have no plans to touch the outstanding ways ALLOW
FILTERING is broken until I get to CASSANDRA-19007
, which hopefully
happens soon.

On Tue, Oct 1, 2024 at 9:23 PM Jon Haddad  wrote:

> This also seems like an optimization. Why not go in 5.0?
>
>
> On Tue, Oct 1, 2024 at 10:14 PM Jordan West  wrote:
>
>> Agreed this would absolutely be a win. Dont see need for a flag either.
>>
>> On Tue, Oct 1, 2024 at 1:31 PM Caleb Rackliffe 
>> wrote:
>>
>>> Alrighty, with what looks like a fair amount of support, I'll declare
>>> CASSANDRA-19968  
>>> ready
>>> for some preliminary review.
>>>
>>> On Tue, Oct 1, 2024 at 2:41 PM Caleb Rackliffe 
>>> wrote:
>>>
 We did add CASSANDRA-18940
  to make sure
 local SAI post-filtering reads got picked up somewhere, but you're right
 that StorageProxy#readRegular() would start recording some index
 queries in the normal read metrics.

 On Tue, Oct 1, 2024 at 2:11 PM Jeremiah Jordan <
 jeremiah.jor...@gmail.com> wrote:

> Did we add new metrics for index queries?  The only issue I see is
> that this change will mix index queries into the regular read metrics,
> where before they were in the range metrics, so maybe some changes to
> metrics should go with it.  But I think this is a good change over all.
>
> On Oct 1, 2024 at 1:51:10 PM, Jon Haddad 
> wrote:
>
>> This seems like it's strictly a win.  Doesn't sound to me like a flag
>> is needed.
>>
>> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> > (Higher rate of mismatches requiring a second full read? Why would
>>> 2i be more likely?)
>>>
>>> Right, I don't see any reason they should be more likely to actuate
>>> read-repair than slice queries are today...
>>>
>>> Didn't mention this above, but I'd obviously be open to having a
>>> system property that switches this behavior.
>>>
>>> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>>>


 > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe <
 calebrackli...@gmail.com> wrote:
 >
 > Hello fellow secondary index enjoyers!
 >
 > If you're familiar with index queries, you probably know that
 they are treated as range reads no matter what. This is true even if 
 the
 user query restricts results to a single partition. This means that 
 they
 bypass the digest read process that normal single-partition reads do.

 TIL.

 >
 > While I don't think this is something that we need to consider
 for 5.0, I would be very interested in the next major release being 
 able to
 use proper single-partition reads for partition-restricted index 
 queries,
 allowing them to take advantage of digest reads. (If single partition 
 slice
 queries do it, why not index queries?)

 This seems like an obvious yes, so reverse the question - is there
 any reason why we WOULDNT want to do this?

 (Higher rate of mismatches requiring a second full read? Why would
 2i be more likely?)




Re: Status of CEP-1

2024-10-01 Thread Dinesh Joshi
Currently the Sidecar has a lot of functionality that is immediately usable
by the community. Apart from minor fixes, the AuthN/Z story would be
wrapped up soon. Post this, I would propose moving forward with cutting a
release with the existing feature set so we can get this in the hands of
our community.

On Tue, Oct 1, 2024 at 8:27 PM guo Maxwell  wrote:

> Have the same question : what ‘s the plan ?
>
> Jeff Jirsa 于2024年10月2日 周三上午10:43写道:
>
>>
>>
>> On Oct 1, 2024, at 7:26 PM, Josh McKenzie  wrote:
>>
>> However it is used by a number of other features as a dependency such as
>> analytics, backup/restore, repair, metrics, and CDC
>>
>> It seems like a natural pressure relief valve for moving operations out
>> of a core C* node that are well served out of process.
>>
>>
>> Yea, but the point of the foundation is to RELEASE software for the
>> public good, and the link asserting consensus was dec2018, so its’ 5.5
>> years and no releases.
>>
>> What’s the plan here?
>>
>>
>>
>>
>>


Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jordan West
Agreed this would absolutely be a win. Dont see need for a flag either.

On Tue, Oct 1, 2024 at 1:31 PM Caleb Rackliffe 
wrote:

> Alrighty, with what looks like a fair amount of support, I'll declare
> CASSANDRA-19968  ready
> for some preliminary review.
>
> On Tue, Oct 1, 2024 at 2:41 PM Caleb Rackliffe 
> wrote:
>
>> We did add CASSANDRA-18940
>>  to make sure
>> local SAI post-filtering reads got picked up somewhere, but you're right
>> that StorageProxy#readRegular() would start recording some index queries
>> in the normal read metrics.
>>
>> On Tue, Oct 1, 2024 at 2:11 PM Jeremiah Jordan 
>> wrote:
>>
>>> Did we add new metrics for index queries?  The only issue I see is that
>>> this change will mix index queries into the regular read metrics, where
>>> before they were in the range metrics, so maybe some changes to metrics
>>> should go with it.  But I think this is a good change over all.
>>>
>>> On Oct 1, 2024 at 1:51:10 PM, Jon Haddad 
>>> wrote:
>>>
 This seems like it's strictly a win.  Doesn't sound to me like a flag
 is needed.

 On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe <
 calebrackli...@gmail.com> wrote:

> > (Higher rate of mismatches requiring a second full read? Why would
> 2i be more likely?)
>
> Right, I don't see any reason they should be more likely to actuate
> read-repair than slice queries are today...
>
> Didn't mention this above, but I'd obviously be open to having a
> system property that switches this behavior.
>
> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>
>>
>>
>> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>> >
>> > Hello fellow secondary index enjoyers!
>> >
>> > If you're familiar with index queries, you probably know that they
>> are treated as range reads no matter what. This is true even if the user
>> query restricts results to a single partition. This means that they 
>> bypass
>> the digest read process that normal single-partition reads do.
>>
>> TIL.
>>
>> >
>> > While I don't think this is something that we need to consider for
>> 5.0, I would be very interested in the next major release being able to 
>> use
>> proper single-partition reads for partition-restricted index queries,
>> allowing them to take advantage of digest reads. (If single partition 
>> slice
>> queries do it, why not index queries?)
>>
>> This seems like an obvious yes, so reverse the question - is there
>> any reason why we WOULDNT want to do this?
>>
>> (Higher rate of mismatches requiring a second full read? Why would 2i
>> be more likely?)
>>
>>


Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-10-01 Thread Mick Semb Wever
To play devil's advocate here, it's important that the subprojects don't
lose visibility and silo from the rest of the project.

There are different ways to solve this, and lumping everything into one
jira project is a messy and poor way of doing it.  But as the sidecar has
shown us, subproject activity should somehow be made noisy to us.  We need
sorts of common spaces in the project.

If we go the separate jira project route, then some suggestions to help
with this are:
- Qbot notifications in #cassandra-dev and #cassandra-noise , as well as in
any subproject channels
- some cadence of dev@ ML updates, e.g. on activities, or dependency
changes, etc
- regular releases


On Tue, 9 Apr 2024 at 04:11, Dinesh Joshi  wrote:

> hi folks - sorry to have dropped the ball on responding to this thread.
>
> My 2 cents are as follows -
>
> 1. Having a separate JIRA project for each sub-project will add management
> overhead. This option, however, allows us to model unique workflows for the
> sub-project.
>
> 2. Managing the sub-project as part of the Cassandra JIRA project would
> imply less management overhead but the sub-project would need to conform to
> the same workflows.
>
> I would pick option 1 unless there is a strong reason and desire to manage
> a separate Jira project. We can always split out the Java Driver project if
> things don't work out. OTOH merging a Jira project is harder.
>
> Thanks,
>
> Dinesh
>
> On Thu, Apr 4, 2024 at 12:45 PM Abe Ratnofsky  wrote:
>
>> CEP-8 proposes using separate Jira projects per Cassandra sub-project:
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>>
>> > We suggest distinct Jira projects, one per driver, all to be created.
>>
>> I don't see any discussion changing that from the [DISCUSS] or vote
>> threads:
>> https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
>> https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>> https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p
>>
>> But looks like upon acceptance that was changed:
>> https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o
>>
>> > New issues will be tracked under the CASSANDRA project on Apache’s JIRA
>>  under the component
>> ‘Client/java-driver’.
>>
>> I'm in favor of using the same Jira as Cassandra proper. Committership is
>> project-wide, so having a standardized process (same ticket flow, review
>> rules, labels, etc. is beneficial). But multiple votes happened based on
>> the content of the CEP, so we should stick to what was voted on and move to
>> a separate Jira.
>>
>> --
>> Abe
>>
>