Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-15 Thread Bernardo Botella
Hi everyone, I’d like to make one final round of feedback request for this CEP-44: Kafka integration for Cassandra CDC using Sidecar before calling in a vote. We’ll leave it open for a few more days, and if nothing else comes in, we will call in a vote. Bernardo > On Oct 1, 2024, at 6:58 AM,

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-01 Thread James Berragan
It seems this has triggered some important discussions about CEP-1 and the Sidecar. Let's keep those in their respective threads and focus this conversation on CEP-44. Patrick, I think I missed your point "There is also little mention of where the increased resource load would be handled." - you'r

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Josh McKenzie
> This is the type of hidden subproject that will get us into trouble with the > board/foundation. I'm sure it's getting enough committer eyeballs, and some > PMC oversight, but maybe not enough. I don't agree with the qualifier of it as being hidden. It's definitely lower traffic than the mai

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Patrick McFadin
I'm mentioning it because I was surprised and I feel like I generally have a finger on the pulse of the project. I would love to talk about it more and get more community support and interest. On Mon, Sep 30, 2024 at 11:01 AM Mick Semb Wever wrote: > Agree with Jon, Josh and Patrick here. > > T

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Mick Semb Wever
Agree with Jon, Josh and Patrick here. This is the type of hidden subproject that will get us into trouble with the board/foundation. I'm sure it's getting enough committer eyeballs, and some PMC oversight, but maybe not enough. Addressing the more material points that Jon mentions is the best

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Jon Haddad
I think it depends on what lens you're looking at the sidecar through. If you're actively working on it, and pulling it into your own infra, sure. It's a thing. If you're an outsider? I have a hard time seeing it. - No documentation as to what it does - No releases - No build instructions - Tr

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread James Berragan
Thanks for the discussions. I do anticipate that Accord will make things very much better, however I think if consumers are ultimately going to be replay the log into some other system (say Apache Iceberg) exact-once delivery will always be tricky, but perhaps not entirely necessary given the linea

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Josh McKenzie
The CEP for the sidecar has stalled. The sidecar itself is very much alive and a thing. CEP != artifact. We should definitely clean that up though. On Mon, Sep 30, 2024, at 10:59 AM, Dinesh Joshi wrote: > Patrick, could you please elaborate? The Sidecar has been a thing for a while > now. > >

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Dinesh Joshi
Patrick, could you please elaborate? The Sidecar has been a thing for a while now. On Mon, Sep 30, 2024 at 7:51 AM Patrick McFadin wrote: > I made the mistake of asking two things in one email. > > First thing I asked. Sidecar? Stalled CEP so why is this being talked > about like this is a thing

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Patrick McFadin
I made the mistake of asking two things in one email. First thing I asked. Sidecar? Stalled CEP so why is this being talked about like this is a thing? On Mon, Sep 30, 2024 at 7:21 AM Benedict wrote: > Sorry Bernardo, you may have misunderstood me. I don’t have any concerns, > I was suggesting

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Benedict
Sorry Bernardo, you may have misunderstood me. I don’t have any concerns, I was suggesting a possible future scenario where CDC for Kafka via sidecar is changed to use a hypothetical future topic subscription service provided by C*. It was meant to show that this CEP may be easily decoupled from an

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Bernardo Botella
Thanks everyone for the comments. Patrick: The proposal includes a “best effort” approach for deduplication (some details can be found on the Digest class comments on the PR here https://github.com/apache/cassandra-analytics/pull/87/files#diff-3a09caecc1da13419d92cde56a7cfc7d253faac08182e6c2768b

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Josh McKenzie
> I don't see much on how this would be handled other than "left to the end > user to figure out." My immediate thought when I read that was "Yes. But it's moving where we draw the line of 'left to the end user to figure out' *much further* than it was before". This should only be necessary in

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread Benedict
Yes, with accord it should be fairly easy to have reliable no-dupe log streaming without an elected leader. Given the broad set of use cases, I can imagine supporting some more native topic subscription API in C* rather than requiring Kafka, so perhaps any integration of Kafka with the sidecar can

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-29 Thread Jeff Jirsa
Transactional metadata and Accord should make it MUCH easier to do duplication avoiding CDC (and I was going to note that someone should ensure that the interfaces exposed to the public are stable enough not to change the published api once those exist)On Sep 29, 2024, at 7:04 PM, Patrick McFadin

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-29 Thread Patrick McFadin
As I was reviewing this, it occurred to me that it was talking about Sidecar like it was a thing but that CEP has been stalled for quite some time: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 If work on this is being done, should we get this official and wrapped up?

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-28 Thread Jon Haddad
Yes! I’m really looking forward to trying this out. The CEP looks really well thought out. I think this will make CDC a lot more useful for a lot of teams. Jon On Fri, Sep 27, 2024 at 4:23 PM Josh McKenzie wrote: > Really excited to see this hit the ML James. > > As author of the base CDC (get

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-27 Thread Josh McKenzie
Really excited to see this hit the ML James. As author of the base CDC (get your stones ready for throwing :D) and someone moderately involved in the CEP here, definitely welcome any questions. CDC is a *thorny* *problem *in a multi-replica distributed system like this. On Fri, Sep 27, 2024, at

[DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-27 Thread James Berragan
Hi everyone, Wiki: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-44%3A+Kafka+integration+for+Cassandra+CDC+using+Sidecar We would like to propose this CEP for adoption by the community. CDC is a common technique in databases but right now there is no out-of-the-box solution to do thi