Re: Per partition local ordering

Dave Herrington Tue, 08 Apr 2025 10:37:30 -0700

Patrick and Jeff,

I have to chime in with an opinion having been a SQL person for more than
30 years...


The DISTINCT concept is a little confusing to me, since, in SQL, DISTINCT
reduces a repeating result set to a unique result set (where all of the
selected values are repeating), rather than just returning the first and
last values within each partition.  I studied Patrick's suggestion, but the
SQL side of my brain was struggling with the idea.

Here is what my SQL brain came up with...

SQL has a FIRST_VALUE and LAST_VALUE function that is used to partition the
data and then grab the first or last values in each partition.

This is a SQL Server doc page that shows the syntax for FIRST_VALUE:
https://learn.microsoft.com/en-us/sql/t-sql/functions/first-value-transact-sql?view=sql-server-ver16
.

The SQL syntax is more elaborate, because partitioning the data can be
flexible within the query, but in Cassandra the partitioning is fixed, and
ordering is determined by the clustering columns.

Borrowing from the SQL concept for Cassandra CQL, this would be easily
understandable by my SQL brain:

SELECT device_id, sensor_id,
    FIRST_VALUE(time) AS first_time,
    FIRST_VALUE(value) AS first_value,
    LAST_VALUE(time) AS last_time,
    LAST_VALUE(value) AS last_value
FROM data
WHERE deviceId = 'mydevice' and sensor_id IN (‘s1’, ‘s2’, ‘s3’)
ORDER BY time ASC;

The ORDER BY would control the order in which the FIRST and LAST values are
evaluated.

This looks a bit like the Cassandra aggregate functions like MIN() and
MAX().

-Dave







On Mon, Apr 7, 2025 at 2:23 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Not Patrick, but:
>
> - would also love being closer to SQL
> - there’s no work on this specific grammar, yet
> - it would depend on a real query optimizer, which IS somewhat in flight
> (or at least a cost based optimizer was proposed)
>
> > On Apr 7, 2025, at 2:05 PM, Artem Golovko <artemgolovk...@gmail.com>
> wrote:
> >
> > Hi Patrick,
> >
> > Really good point, I even did not think about it and actually
> > completely forgot that ORDER BY with DISTINCT will sort the result
> > within the group only, but has nothing with ordering of the final
> > result. I totally agree that aligning CQL with standard SQL behavior
> > would be a great idea. By the way, are there any open projects or
> > discussions around this? Or is it still just an internal PoC at this
> > stage?
> >
> > Artem
> >
> > пт, 4 апр. 2025 г. в 17:02, Patrick McFadin <pmcfa...@gmail.com>:
> >>
> >> I played around with this idea by simulating it in ChatGPT (Yes you can
> do that) It occurred to me that this is similar SQL functionality to the
> DISTINCT keyword. Seeing how we can align CQL with SQL is something I'm
> personally investing more time in for the long-term of the project. This
> could be an opportunity to get one step closer with useful syntax.
> >>
> >> Re-arranging your idea in SQL syntax, it would look like this:
> >>
> >> SELECT DISTINCT ON (sensor_id) device_id, sensor_id, time, value
> >> FROM data
> >> WHERE device_id = 'mydevice'
> >>  AND sensor_id IN ('s1', 's2', 's3')
> >> ORDER BY sensor_id, time DESC;
> >>
> >> I think this is the same outcome and similar partition-level
> implementation. DISTINCT on a multi-partition query would return the first
> value of each partition. This would especially work in these types of
> primary keys: PRIMARY KEY((device_id, sensor_id), time)
> >>
> >> In the long term, we don't have more unique syntax building up, which I
> really prefer.
> >>
> >> Patrick
> >>
> >>> On Tue, Apr 1, 2025 at 9:55 AM Artem Golovko <artemgolovk...@gmail.com>
> wrote:
> >>>
> >>> Hello everyone,
> >>>
> >>> I did not find any discussions about that topic and would like to ask
> >>> if there any considerations to introduce the "PER PARTITION ORDER"
> >>> functionality. It's a duplication of Scylla question, but now for
> >>> Cassandra
> https://forum.scylladb.com/t/per-partition-local-ordering/3412.
> >>> I am also not so experienced from the cassandra code implementation
> >>> point of view, but according to my knowledge it should make sense.
> >>>
> >>> Let me introduce the use case.
> >>>
> >>> Data model:
> >>>
> >>> CREATE TABLE data(
> >>>   device_id TEXT,
> >>>   sensor_id TEXT,
> >>>   time TIMESTAMP,
> >>>   value BLOB,
> >>>   PRIMARY KEY((device_id, sensor_id), time)
> >>> )
> >>>
> >>> Queries: Give me the first and the last value for all sensors within
> deviceId.
> >>>
> >>> Problem: Within the device it's possible to have 10k of sensors or
> >>> more and if we wanted to get a "snapshot" (e.g. list of sensors with
> >>> values having the max timestamp) then it may take lots of round trips
> >>> for small request-response. Therefore we can use the "IN" clause here,
> >>> grouping keys based on the replica node (e.g. batch node aware read).
> >>>
> >>> 1. First point
> >>> SELECT * FROM data WHERE deviceId = 'mydevice' and sensor_id IN (‘s1’,
> >>> ‘s2’, ‘s3’) PER PARTITION LIMIT 1
> >>>
> >>> Here we can get the first point for each partition and don’t care
> >>> about “global” ordering, so the resulting rows won’t be sorted by
> >>> clustering key and natural order will be applied only locally within
> >>> each partition.
> >>>
> >>> 2. Last point
> >>> SELECT * FROM data WHERE deviceId = 'mydevice' and sensor_id IN (‘s1’,
> >>> ‘s2’, ‘s3’) ORDER BY time DESC PER PARTITION LIMIT 1
> >>>
> >>> It’s not possible to use IN and ORDER BY together with paging enabled.
> >>> The reason is that Cassandra applies order “localy” within each
> >>> partition, but also applies it “globally” across the resulting rows
> >>> that makes cassandra store the result in-memory to apply “global”
> >>> sorting. But if I don’t care about “global” ordering and only want to
> >>> specify ordering within each partition that introduces performance
> >>> overhead.
> >>>
> >>> What if to introduce "PER PARTITION ORDER" statement? In most of the
> >>> use cases it should not introduce much benefits, because we're limited
> >>> to the number of keys in the IN clause (by default 100), so the result
> >>> should not be big enough to do not fit into the memory, but maybe
> >>> someone have another use case when PER PARTITION LIMIT more than 1 or
> >>> payload is big enough.
>


-- 
-Dave

David A. Herrington II
President and Chief Engineer
RhinoSource, Inc.

*Data Lake Architecture, Cloud Computing and Advanced Analytics.*

www.rhinosource.com

Re: Per partition local ordering

Reply via email to