Re: [VOTE] Release Apache Cassandra 5.0.4

2025-04-08 Thread Brandon Williams
+1

On Mon, Apr 7, 2025 at 7:35 AM Brandon Williams
 wrote:
>
> Proposing the test build of Cassandra 5.0.4 for release.
>
> sha1: b81163b04b1d99036730ff233595d7bfb88611d1
> Git: https://github.com/apache/cassandra/tree/5.0.4-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1391/org/apache/cassandra/cassandra-all/5.0.4/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0.4/
>
> The vote will be open for 72 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding
> +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0.4-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/5.0.4-tentative/NEWS.txt


[DISCUSS] slack notifications for subprojects

2025-04-08 Thread Josh McKenzie
Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation and 
state change. Seems we could:
 1. notify in #cassandra-dev and #cassandra-sidecar
 2. notify in the #cassandra-sidecar channel
My preference is for 1 since there's a tight relationship between what we're 
doing with the subprojects and the main db and there's probably shared interest 
there.

Any other opinions?

Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Brandon Williams
I'm +1 on "mimic current CASSANDRA tickets" as Ekaterina describes.

Kind Regards,
Brandon

On Tue, Apr 8, 2025 at 2:51 PM Ekaterina Dimitrova
 wrote:
>
> I’d say we mimic the current CASSANDRA tickets handling plus adding to the 
> #cassandra-sidecar. That means:
>
> 1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
> 2) all other notifications to #cassandra-noise
> WDYT?
>
> On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:
>>
>> Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation and 
>> state change. Seems we could:
>>
>> notify in #cassandra-dev and #cassandra-sidecar
>> notify in the #cassandra-sidecar channel
>>
>> My preference is for 1 since there's a tight relationship between what we're 
>> doing with the subprojects and the main db and there's probably shared 
>> interest there.
>>
>> Any other opinions?


Re: [VOTE] Release Apache Cassandra 5.0.4

2025-04-08 Thread C. Scott Andreas

+1 Important we get a release out that resolves CASSANDRA-20449: Serialization can lose 
complex deletions in a mutation with multiple collections in a row On Apr 8, 2025, at 4:07 
AM, Brandon Williams  wrote: +1 On Mon, Apr 7, 2025 at 
7:35 AM Brandon Williams  wrote: Proposing the test build 
of Cassandra 5.0.4 for release. sha1: b81163b04b1d99036730ff233595d7bfb88611d1 Git: 
https://github.com/apache/cassandra/tree/5.0.4-tentative Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1391/org/apache/cassandra/cassandra-all/5.0.4/
 The Source and Build Artifacts, and the Debian and RPM packages and repositories, are 
available here: https://dist.apache.org/repos/dist/dev/cassandra/5.0.4/ The vote will be 
open for 72 hours (longer if needed). Everyone who has tested the build is invited to vote. 
Votes by PMC members are considered binding. A vote passes if there are at least three 
binding +1s and no -1's. [1]: CHANGES.txt: 
https://github.com/apache/cassandra/blob/5.0.4-tentative/CHANGES.txt [2]: NEWS.txt: 
https://github.com/apache/cassandra/blob/5.0.4-tentative/NEWS.txt

Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Joel Shepherd
FWIW, my personal experience is that mixing automated notifications 
(beyond a very low volume) with human communications adds a bunch of 
noise to the human conversations and increases the risk of an 
interesting automated notification being missed (scrolling past them to 
get to the meatier human conversations).


I'm curious what the argument for pumping ticket notifications into 
#cassandra-dev, etc., is, versus pumping them into a dedicated channel.


Thanks -- Joel.

On 4/8/2025 12:51 PM, Ekaterina Dimitrova wrote:


I’d say we mimic the current CASSANDRA tickets handling plus adding to 
the #cassandra-sidecar. That means:


1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
2) all other notifications to #cassandra-noise
WDYT?

On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:

Currently we don't have Qbot notifying us on CASSSIDECAR ticket
creation and state change. Seems we could:

 1. notify in #cassandra-dev and #cassandra-sidecar
 2. notify in the #cassandra-sidecar channel

My preference is for 1 since there's a tight relationship between
what we're doing with the subprojects and the main db and there's
probably shared interest there.

Any other opinions?


Re: Per partition local ordering

2025-04-08 Thread Dave Herrington
Patrick and Jeff,

I have to chime in with an opinion having been a SQL person for more than
30 years...

The DISTINCT concept is a little confusing to me, since, in SQL, DISTINCT
reduces a repeating result set to a unique result set (where all of the
selected values are repeating), rather than just returning the first and
last values within each partition.  I studied Patrick's suggestion, but the
SQL side of my brain was struggling with the idea.

Here is what my SQL brain came up with...

SQL has a FIRST_VALUE and LAST_VALUE function that is used to partition the
data and then grab the first or last values in each partition.

This is a SQL Server doc page that shows the syntax for FIRST_VALUE:
https://learn.microsoft.com/en-us/sql/t-sql/functions/first-value-transact-sql?view=sql-server-ver16
.

The SQL syntax is more elaborate, because partitioning the data can be
flexible within the query, but in Cassandra the partitioning is fixed, and
ordering is determined by the clustering columns.

Borrowing from the SQL concept for Cassandra CQL, this would be easily
understandable by my SQL brain:

SELECT device_id, sensor_id,
FIRST_VALUE(time) AS first_time,
FIRST_VALUE(value) AS first_value,
LAST_VALUE(time) AS last_time,
LAST_VALUE(value) AS last_value
FROM data
WHERE deviceId = 'mydevice' and sensor_id IN (‘s1’, ‘s2’, ‘s3’)
ORDER BY time ASC;

The ORDER BY would control the order in which the FIRST and LAST values are
evaluated.

This looks a bit like the Cassandra aggregate functions like MIN() and
MAX().

-Dave







On Mon, Apr 7, 2025 at 2:23 PM Jeff Jirsa  wrote:

> Not Patrick, but:
>
> - would also love being closer to SQL
> - there’s no work on this specific grammar, yet
> - it would depend on a real query optimizer, which IS somewhat in flight
> (or at least a cost based optimizer was proposed)
>
> > On Apr 7, 2025, at 2:05 PM, Artem Golovko 
> wrote:
> >
> > Hi Patrick,
> >
> > Really good point, I even did not think about it and actually
> > completely forgot that ORDER BY with DISTINCT will sort the result
> > within the group only, but has nothing with ordering of the final
> > result. I totally agree that aligning CQL with standard SQL behavior
> > would be a great idea. By the way, are there any open projects or
> > discussions around this? Or is it still just an internal PoC at this
> > stage?
> >
> > Artem
> >
> > пт, 4 апр. 2025 г. в 17:02, Patrick McFadin :
> >>
> >> I played around with this idea by simulating it in ChatGPT (Yes you can
> do that) It occurred to me that this is similar SQL functionality to the
> DISTINCT keyword. Seeing how we can align CQL with SQL is something I'm
> personally investing more time in for the long-term of the project. This
> could be an opportunity to get one step closer with useful syntax.
> >>
> >> Re-arranging your idea in SQL syntax, it would look like this:
> >>
> >> SELECT DISTINCT ON (sensor_id) device_id, sensor_id, time, value
> >> FROM data
> >> WHERE device_id = 'mydevice'
> >>  AND sensor_id IN ('s1', 's2', 's3')
> >> ORDER BY sensor_id, time DESC;
> >>
> >> I think this is the same outcome and similar partition-level
> implementation. DISTINCT on a multi-partition query would return the first
> value of each partition. This would especially work in these types of
> primary keys: PRIMARY KEY((device_id, sensor_id), time)
> >>
> >> In the long term, we don't have more unique syntax building up, which I
> really prefer.
> >>
> >> Patrick
> >>
> >>> On Tue, Apr 1, 2025 at 9:55 AM Artem Golovko 
> wrote:
> >>>
> >>> Hello everyone,
> >>>
> >>> I did not find any discussions about that topic and would like to ask
> >>> if there any considerations to introduce the "PER PARTITION ORDER"
> >>> functionality. It's a duplication of Scylla question, but now for
> >>> Cassandra
> https://forum.scylladb.com/t/per-partition-local-ordering/3412.
> >>> I am also not so experienced from the cassandra code implementation
> >>> point of view, but according to my knowledge it should make sense.
> >>>
> >>> Let me introduce the use case.
> >>>
> >>> Data model:
> >>>
> >>> CREATE TABLE data(
> >>>   device_id TEXT,
> >>>   sensor_id TEXT,
> >>>   time TIMESTAMP,
> >>>   value BLOB,
> >>>   PRIMARY KEY((device_id, sensor_id), time)
> >>> )
> >>>
> >>> Queries: Give me the first and the last value for all sensors within
> deviceId.
> >>>
> >>> Problem: Within the device it's possible to have 10k of sensors or
> >>> more and if we wanted to get a "snapshot" (e.g. list of sensors with
> >>> values having the max timestamp) then it may take lots of round trips
> >>> for small request-response. Therefore we can use the "IN" clause here,
> >>> grouping keys based on the replica node (e.g. batch node aware read).
> >>>
> >>> 1. First point
> >>> SELECT * FROM data WHERE deviceId = 'mydevice' and sensor_id IN (‘s1’,
> >>> ‘s2’, ‘s3’) PER PARTITION LIMIT 1
> >>>
> >>> Here we can get the first point for each partition and don’t care
> >>> about “global” or

Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Jeremiah Jordan
 +1 from me for that proposal.

On Apr 8, 2025 at 2:51:09 PM, Ekaterina Dimitrova 
wrote:

> I’d say we mimic the current CASSANDRA tickets handling plus adding to the
> #cassandra-sidecar. That means:
>
> 1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
> 2) all other notifications to #cassandra-noise
> WDYT?
>
> On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:
>
>> Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation
>> and state change. Seems we could:
>>
>>1. notify in #cassandra-dev and #cassandra-sidecar
>>2. notify in the #cassandra-sidecar channel
>>
>> My preference is for 1 since there's a tight relationship between what
>> we're doing with the subprojects and the main db and there's probably
>> shared interest there.
>>
>> Any other opinions?
>>
>


Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Francisco Guerrero
+1. Just one clarification, we already have CASSSIDECAR notifications going to 
#cassandra-dev [1]. But I think we should also have them in #cassandra-sidecar

Best,
- Francisco

[1] https://issues.apache.org/jira/browse/INFRA-26216

On 2025/04/08 20:05:35 Jeremiah Jordan wrote:
>  +1 from me for that proposal.
> 
> On Apr 8, 2025 at 2:51:09 PM, Ekaterina Dimitrova 
> wrote:
> 
> > I’d say we mimic the current CASSANDRA tickets handling plus adding to the
> > #cassandra-sidecar. That means:
> >
> > 1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
> > 2) all other notifications to #cassandra-noise
> > WDYT?
> >
> > On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:
> >
> >> Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation
> >> and state change. Seems we could:
> >>
> >>1. notify in #cassandra-dev and #cassandra-sidecar
> >>2. notify in the #cassandra-sidecar channel
> >>
> >> My preference is for 1 since there's a tight relationship between what
> >> we're doing with the subprojects and the main db and there's probably
> >> shared interest there.
> >>
> >> Any other opinions?
> >>
> >
> 


Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Josh McKenzie
Yep - this is classic "Ready, Fire, Aim" from me.

But we did hit a new target (adding things to #cassandra-sidecar), so that's a 
plus. :D

On Tue, Apr 8, 2025, at 4:12 PM, Francisco Guerrero wrote:
> +1. Just one clarification, we already have CASSSIDECAR notifications going 
> to #cassandra-dev [1]. But I think we should also have them in 
> #cassandra-sidecar
> 
> Best,
> - Francisco
> 
> [1] https://issues.apache.org/jira/browse/INFRA-26216
> 
> On 2025/04/08 20:05:35 Jeremiah Jordan wrote:
> >  +1 from me for that proposal.
> > 
> > On Apr 8, 2025 at 2:51:09 PM, Ekaterina Dimitrova 
> > wrote:
> > 
> > > I’d say we mimic the current CASSANDRA tickets handling plus adding to the
> > > #cassandra-sidecar. That means:
> > >
> > > 1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
> > > 2) all other notifications to #cassandra-noise
> > > WDYT?
> > >
> > > On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:
> > >
> > >> Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation
> > >> and state change. Seems we could:
> > >>
> > >>1. notify in #cassandra-dev and #cassandra-sidecar
> > >>2. notify in the #cassandra-sidecar channel
> > >>
> > >> My preference is for 1 since there's a tight relationship between what
> > >> we're doing with the subprojects and the main db and there's probably
> > >> shared interest there.
> > >>
> > >> Any other opinions?
> > >>
> > >
> > 
>