gortiz opened a new issue, #17299: URL: https://github.com/apache/pinot/issues/17299
Row expressions are constructions defined in [SQL92](https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). Their syntax is either ROW(v1, v2, ..., vN) or just (v1, v2, ..., vN), and the semantics is that they create a tuple. For example, a temporal table can be defined by a list of ROW values. Supporting row expressions would be awesome, but our typesystem is not expressive enough to do so and we don’t have the time to refactor something as sensible as the typesystem. But one place where row expressions are handy is to compare two rows in a query ordered by something. Specifically, they are very useful for implementing pagination effectively. For example, a query like ```sql SELECT whatever FROM table ORDER BY col1 LIMIT 10 OFFSET 1000000 ``` is very inefficient in any database, but especially in MSE. See, for example, [Pagination Done the Right Way by Markus Winand](https://www.youtube.com/watch?v=GzMaN-IX7wQ&list=PLV6pYAaZY60FGQrlXkyGIB2JG22FcikBb) (or the [slides](https://www.slideshare.net/slideshow/p2d2-pagination-done-the-postgresql-way/22210863)) Instead, users should use ```sql SELECT whatever, col1 FROM table WHERE col1 >= prev_col1 ORDER BY col1 LIMIT 10 ``` But the query becomes more complex when order by is ordering by more than one column, which is common in Pinot. As a way to avoid duplicates is to order by segment name and did: ```sql SELECT whatever, col1, $segmentName, $did FROM table WHERE col1 >= prev_col1 OR col1 == prev_col1 AND ( $segmentName > prev_segment_name OR ( $segmentName == prev_segment_name AND $did >= prev_did ) ) ORDER BY col1, $segmentName, $did LIMIT 10 ``` This is obviously very tedious to write and error-prone. Instead, the following query has the same semantics and is easier to write: ```sql SELECT whatever, col1, $segmentName, $did FROM table WHERE (col1, $segmentName, $did) >= (prev_col1, prev_segment_name, prev_did) ORDER BY col1, $segmentName, $did LIMIT 10 ``` The idea of this ticket is to: Add the ROW function to PinotOperatorTable so Calcite can parse ROW expressions. Add rules that modify comparisons between ROW expressions. For example, `ROW(a, b) > ROW(1, 2)` should be transformed to `a > 1 OR a = 1 AND b > 2`. The jOOQ documents the transformation [here](https://www.jooq.org/doc/latest/manual/sql-building/conditional-expressions/comparison-predicate-degree-n/). Add a rule that fails if it finds a ROW expression. Change the planner to apply rules defined in 2 before the rule created in 3. This way, we allow our users to use ROW expressions to compare columns, but at the same time don’t force us to support row expressions in our execution type system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
