Re: [PR] update limit pushdown to tablet reader [doris]

via GitHub Wed, 25 Mar 2026 05:59:28 -0700


github-actions[bot] commented on code in PR #61713:
URL: https://github.com/apache/doris/pull/61713#discussion_r2988055023



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -248,8 +255,39 @@ Status VCollectIterator::next(Block* block) {
         return _topn_next(block);
     }
 

Review Comment:
   **[Medium - Performance]** The `filter_block_conjuncts` check and limit 
enforcement logic now run on the hot path for **all** non-topn queries, even 
when `_general_read_limit` is not active (i.e., `-1`). While each individual 
check is cheap (empty vector check + integer comparison), this adds overhead to 
every `next(Block*)` call for queries without LIMIT.
   
   Consider guarding the entire filter+limit block:
   ```cpp
   if (_general_read_limit > 0) {
       // Apply filter_block_conjuncts ...
       if (!_reader->_reader_context.filter_block_conjuncts.empty()) {
           RETURN_IF_ERROR(VExprContext::filter_block(...));
       }
       // Enforce limit ...
       _general_rows_returned += block->rows();
       if (_general_rows_returned > _general_read_limit) { ... }
   }
   ```
   This makes the intent clearer and avoids any overhead when the feature is 
not active.



##########
regression-test/suites/query_p0/limit/test_general_limit_pushdown.groovy:
##########
@@ -0,0 +1,135 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Test general limit pushdown to storage layer for DUP_KEYS and UNIQUE_KEYS 
(MOW).
+// This exercises the non-topn limit path where VCollectIterator enforces
+// general_read_limit with filter_block_conjuncts applied before counting.
+
+suite("test_general_limit_pushdown") {
+
+    // ---- DUP_KEYS table ----
+    sql "DROP TABLE IF EXISTS dup_limit_pushdown"
+    sql """
+        CREATE TABLE dup_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        DUPLICATE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1

Review Comment:
   **[Medium - Test Coverage]** The tests only use `BUCKETS 1`, which means a 
single scanner per query. The most interesting scenario for this optimization 
is **multi-bucket tables** where multiple scanners each independently apply the 
limit, and the outer pipeline limit must correctly truncate to the global 
LIMIT. Consider adding a test with `BUCKETS 4` or more and a larger dataset to 
exercise the multi-scanner path.
   
   Also consider adding:
   1. **Negative test for AGG_KEYS table**: Verify limit pushdown does NOT 
activate (AGG_KEYS requires merge, so `_storage_no_merge()` returns false).
   2. **Negative test for UNIQUE_KEYS MOR (without MOW)**: Same reasoning.
   3. **LIMIT with OFFSET**: `SELECT ... LIMIT 5 OFFSET 10` to verify 
correctness when offset is involved.
   4. **LIMIT 0**: Edge case that should return no rows.



##########
be/src/exec/scan/olap_scanner.cpp:
##########
@@ -502,6 +502,14 @@ Status OlapScanner::_init_tablet_reader_params(
                 _tablet_reader_params.filter_block_conjuncts = _conjuncts;
                 _conjuncts.clear();
             }
+        } else if (_limit > 0 && olap_scan_local_state->_storage_no_merge()) {

Review Comment:
   **[Note - Correctness analysis]** `_limit` here is `tnode.limit` (the full 
query LIMIT), not a per-scanner fraction. This is correct: each scanner 
independently limits its storage reads to N rows, and the outer 
`ScanOperatorX::get_block()` -> `reached_limit()` enforces the global N across 
all scanners. This matches the existing topn pattern.
   
   However, note that for queries like `SELECT * FROM t WHERE rare_condition 
LIMIT 10` with many tablets, every scanner reads up to 10 post-filter rows even 
though only 10 total are needed. The 
`adaptive_pipeline_task_serial_read_on_limit` optimization mitigates this for 
small limits by forcing serial scan, but for larger limits the amplification 
factor is O(num_scanners). This is a known trade-off, not a bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] update limit pushdown to tablet reader [doris]

Reply via email to