CalvinKirs opened a new pull request, #63675:
URL: https://github.com/apache/doris/pull/63675
This PR adds CTE producer-side limit pushdown in Nereids.
When all CTE consumers only need a bounded number of rows, the optimizer
collects the required row count from each consumer, takes
the maximum value, and pushes that limit into the CTE producer. The
original consumer-side limit is still kept.
The rule only handles safe shapes:
```text
LogicalLimit
LogicalCTEConsumer
```
```text
LogicalLimit
LogicalProject
LogicalCTEConsumer
```
The project must be row-preserving.
## Scenarios
### 1. Direct Limit
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte
LIMIT 10;
```
The consumer only needs 10 rows, so the CTE producer can produce at most
10 rows.
### 2. Project + Limit
```sql
WITH cte AS (
SELECT order_id, total_price, user_id FROM orders
)
SELECT order_id, total_price
FROM cte
LIMIT 10;
```
A normal project only prunes columns and does not change row count, so the
producer can still be limited to 10 rows.
### 3. Multiple Consumers + Limit
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte LIMIT 10
UNION ALL
SELECT * FROM cte LIMIT 20;
```
For multiple CTE consumers, the producer limit is:
```text
producerLimit = max(consumerLimit1, consumerLimit2, ...)
```
In this case, the pushed producer limit is 20.
If any consumer needs full CTE data, pushdown is skipped:
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte LIMIT 10
UNION ALL
SELECT * FROM cte;
```
### 4. Limit + Offset
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte
LIMIT 10 OFFSET 100;
```
The consumer needs to skip 100 rows and then return 10 rows, so the
producer must provide at least 110 rows.
The producer side only truncates rows and does not apply offset:
```text
producerLimit = limit + offset
producerOffset = 0
```
### 5. SplitLimit
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte
LIMIT 10 OFFSET 100;
```
Doris may split this into local/global limits. The local limit closest to
the CTE consumer already represents `limit + offset`.
The collector uses the local limit value directly and does not add offset
again.
### 6. Filter + Limit Is Not Matched
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte
WHERE order_id > 10
LIMIT 10;
```
Filter can reduce rows before limit, so the producer may need more than 10
input rows. This rule does not push limit through filter.
### 7. TopN Is Not Matched
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT * FROM cte
ORDER BY order_id
LIMIT 10;
```
`ORDER BY ... LIMIT` is TopN. It needs the first N rows after ordering, so
it cannot be treated as a normal limit.
### 8. Join / Aggregate / Window / Sort Are Not Matched
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT *
FROM cte JOIN users ON cte.user_id = users.user_id
LIMIT 10;
```
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT user_id, COUNT(*)
FROM cte
GROUP BY user_id
LIMIT 10;
```
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT *
FROM (
SELECT order_id, ROW_NUMBER() OVER (ORDER BY order_id) AS rn
FROM cte
) t
LIMIT 10;
```
```sql
WITH cte AS (
SELECT * FROM orders
)
SELECT *
FROM (
SELECT * FROM cte ORDER BY order_id
) t
LIMIT 10;
```
These operators can change row cardinality or ordering semantics. Unless
other rules have already rewritten the shape into `Limit ->
CTEConsumer` or `Limit -> Project -> CTEConsumer`, this collector skips
them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]