CalvinKirs opened a new pull request, #63675:
URL: https://github.com/apache/doris/pull/63675

   This PR adds CTE producer-side limit pushdown in Nereids.
   
     When all CTE consumers only need a bounded number of rows, the optimizer 
collects the required row count from each consumer, takes
     the maximum value, and pushes that limit into the CTE producer. The 
original consumer-side limit is still kept.
   
     The rule only handles safe shapes:
   
     ```text
     LogicalLimit
       LogicalCTEConsumer
     ```
   
     ```text
     LogicalLimit
       LogicalProject
         LogicalCTEConsumer
     ```
   
     The project must be row-preserving.
   
     ## Scenarios
   
     ### 1. Direct Limit
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte
     LIMIT 10;
     ```
   
     The consumer only needs 10 rows, so the CTE producer can produce at most 
10 rows.
   
     ### 2. Project + Limit
   
     ```sql
     WITH cte AS (
         SELECT order_id, total_price, user_id FROM orders
     )
     SELECT order_id, total_price
     FROM cte
     LIMIT 10;
     ```
   
     A normal project only prunes columns and does not change row count, so the 
producer can still be limited to 10 rows.
   
     ### 3. Multiple Consumers + Limit
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte LIMIT 10
     UNION ALL
     SELECT * FROM cte LIMIT 20;
     ```
   
     For multiple CTE consumers, the producer limit is:
   
     ```text
     producerLimit = max(consumerLimit1, consumerLimit2, ...)
     ```
   
     In this case, the pushed producer limit is 20.
   
     If any consumer needs full CTE data, pushdown is skipped:
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte LIMIT 10
     UNION ALL
     SELECT * FROM cte;
     ```
   
     ### 4. Limit + Offset
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte
     LIMIT 10 OFFSET 100;
     ```
   
     The consumer needs to skip 100 rows and then return 10 rows, so the 
producer must provide at least 110 rows.
   
     The producer side only truncates rows and does not apply offset:
   
     ```text
     producerLimit = limit + offset
     producerOffset = 0
     ```
   
     ### 5. SplitLimit
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte
     LIMIT 10 OFFSET 100;
     ```
   
     Doris may split this into local/global limits. The local limit closest to 
the CTE consumer already represents `limit + offset`.
   
     The collector uses the local limit value directly and does not add offset 
again.
   
     ### 6. Filter + Limit Is Not Matched
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte
     WHERE order_id > 10
     LIMIT 10;
     ```
   
     Filter can reduce rows before limit, so the producer may need more than 10 
input rows. This rule does not push limit through filter.
   
     ### 7. TopN Is Not Matched
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT * FROM cte
     ORDER BY order_id
     LIMIT 10;
     ```
   
     `ORDER BY ... LIMIT` is TopN. It needs the first N rows after ordering, so 
it cannot be treated as a normal limit.
   
     ### 8. Join / Aggregate / Window / Sort Are Not Matched
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT *
     FROM cte JOIN users ON cte.user_id = users.user_id
     LIMIT 10;
     ```
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT user_id, COUNT(*)
     FROM cte
     GROUP BY user_id
     LIMIT 10;
     ```
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT *
     FROM (
         SELECT order_id, ROW_NUMBER() OVER (ORDER BY order_id) AS rn
         FROM cte
     ) t
     LIMIT 10;
     ```
   
     ```sql
     WITH cte AS (
         SELECT * FROM orders
     )
     SELECT *
     FROM (
         SELECT * FROM cte ORDER BY order_id
     ) t
     LIMIT 10;
     ```
   
     These operators can change row cardinality or ordering semantics. Unless 
other rules have already rewritten the shape into `Limit ->
     CTEConsumer` or `Limit -> Project -> CTEConsumer`, this collector skips 
them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to