gortiz opened a new issue, #17299:
URL: https://github.com/apache/pinot/issues/17299

   Row expressions are constructions defined in 
[SQL92](https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt). Their 
syntax is either ROW(v1, v2, ..., vN) or just (v1, v2, ..., vN), and the 
semantics is that they create a tuple. For example, a temporal table can be 
defined by a list of ROW values.
   
   Supporting row expressions would be awesome, but our typesystem is not 
expressive enough to do so and we don’t have the time to refactor something as 
sensible as the typesystem. But one place where row expressions are handy is to 
compare two rows in a query ordered by something. Specifically, they are very 
useful for implementing pagination effectively.
   
   For example, a query like 
   
   ```sql
   SELECT whatever 
   FROM table 
   ORDER BY col1 
   LIMIT 10 
   OFFSET 1000000
   ```
   
   is very inefficient in any database, but especially in MSE. See, for 
example, [Pagination Done the Right Way by Markus 
Winand](https://www.youtube.com/watch?v=GzMaN-IX7wQ&list=PLV6pYAaZY60FGQrlXkyGIB2JG22FcikBb)
 (or the 
[slides](https://www.slideshare.net/slideshow/p2d2-pagination-done-the-postgresql-way/22210863))
   
   Instead, users should use
   
   ```sql
   SELECT whatever, col1 
   FROM table
   WHERE col1 >= prev_col1
   ORDER BY col1 
   LIMIT 10 
   ```
   
   But the query becomes more complex when order by is ordering by more than 
one column, which is common in Pinot. As a way to avoid duplicates is to order 
by segment name and did:
   
   ```sql
   SELECT whatever, col1, $segmentName, $did 
   FROM table
   WHERE col1 >= prev_col1
     OR col1 == prev_col1 AND (
       $segmentName > prev_segment_name OR (
         $segmentName == prev_segment_name AND $did >= prev_did
       )
     )
   ORDER BY col1, $segmentName, $did 
   LIMIT 10
   ```
   
   This is obviously very tedious to write and error-prone.
   
   Instead, the following query has the same semantics and is easier to write:
   
   ```sql
   SELECT whatever, col1, $segmentName, $did 
   FROM table
   WHERE (col1, $segmentName, $did) >= (prev_col1, prev_segment_name, prev_did)
   ORDER BY col1, $segmentName, $did 
   LIMIT 10
   ```
   
   The idea of this ticket is to:
   
   Add the ROW function to PinotOperatorTable so Calcite can parse ROW 
expressions.
   
   Add rules that modify comparisons between ROW expressions. For example, 
`ROW(a, b) > ROW(1, 2)` should be transformed to `a > 1 OR a = 1 AND b > 2`. 
The jOOQ documents the transformation 
[here](https://www.jooq.org/doc/latest/manual/sql-building/conditional-expressions/comparison-predicate-degree-n/).
   
   Add a rule that fails if it finds a ROW expression.
   
   Change the planner to apply rules defined in 2 before the rule created in 3.
   
   This way, we allow our users to use ROW expressions to compare columns, but 
at the same time don’t force us to support row expressions in our execution 
type system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to