(doris) branch master updated: [Improvement](scan) update limit pushdown to tablet reader (#61713)

panxiaolei Wed, 01 Apr 2026 00:23:59 -0700

This is an automated email from the ASF dual-hosted git repository.

panxiaolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new afb160157a9 [Improvement](scan) update limit pushdown to tablet reader 
(#61713)
afb160157a9 is described below

commit afb160157a9e7236a3be520003188715ef99333e
Author: Pxl <[email protected]>
AuthorDate: Wed Apr 1 15:23:44 2026 +0800

    [Improvement](scan) update limit pushdown to tablet reader (#61713)
    
    This pull request implements a general limit pushdown optimization for
    storage-layer scans on DUP_KEYS and UNIQUE_KEYS (with Merge-on-Write,
    MOW) tables. The optimization ensures that LIMIT clauses are enforced
    after filtering, preventing the scan from returning fewer rows than
    requested when filters are present. The change is carefully scoped to
    avoid interfering with existing optimizations (like TopN or storage
    merge paths) and includes comprehensive regression tests to validate
    correctness across various scenarios.
    
    **General Limit Pushdown Implementation:**
    
    * Added a `general_read_limit` parameter to the storage layer
    (`TabletReader`, `RowsetReaderContext`, and `VCollectIterator`) to
    enforce a post-filter row limit in non-merge scan paths for DUP_KEYS and
    UNIQUE_KEYS with MOW. The iterator now tracks the number of rows
    returned and stops reading once the limit is reached, ensuring correct
    LIMIT semantics even in the presence of filters.
    
[[1]](diffhunk://#diff-4d3f2b2a64e1dac3bc076994d3cd9a6c569f7a8c6ad5bc69425188ffccb4266bR494-R503)
    
[[2]](diffhunk://#diff-4d3f2b2a64e1dac3bc076994d3cd9a6c569f7a8c6ad5bc69425188ffccb4266bR518-R531)
    
[[3]](diffhunk://#diff-88c1391d53c67a5ab8475c103bb743a2a80cc282e0c752e8fc52b7aaeb2e0e35L95-R102)
    
[[4]](diffhunk://#diff-88c1391d53c67a5ab8475c103bb743a2a80cc282e0c752e8fc52b7aaeb2e0e35R258-R294)
    
[[5]](diffhunk://#diff-b7a74d17e270719c904e8c31fb1308cb82ec0b89459a245dce9cb5505021ac63R353-R358)
    
[[6]](diffhunk://#diff-0025ea2b49c400923787cd77990575f94ba5d925efc95cb16336fbaec92ebfb1R106-R109)
    
[[7]](diffhunk://#diff-979b99e23ca7a4bb73f5beaa00270494778545296929a3de64b2e87dc8299cd6R202-R203)
    
[[8]](diffhunk://#diff-48c14ad6bacc0a0ed497ad0cde34756af85aa98dcb7dc909609a288a503be0c9R216-R220)
    
    **Testing and Validation:**
    
    * Added a new regression test suite
    (`test_general_limit_pushdown.groovy`) covering DUP_KEYS, UNIQUE_KEYS
    (MOW), AGG_KEYS, and MOR tables. The tests verify correct row counts and
    output for LIMIT queries with and without filters, as well as negative
    cases where the optimization must not apply.
    * Added corresponding expected output results for the new tests.
---
 be/src/exec/scan/olap_scanner.cpp                  |  49 +++-
 be/src/storage/iterator/vcollect_iterator.cpp      |  46 +++-
 be/src/storage/iterator/vcollect_iterator.h        |   6 +
 be/src/storage/rowset/rowset_reader_context.h      |   4 +
 be/src/storage/tablet/tablet_reader.cpp            |   2 +
 be/src/storage/tablet/tablet_reader.h              |   5 +
 .../query_p0/limit/test_general_limit_pushdown.out | 131 ++++++++++
 .../limit/test_general_limit_pushdown.groovy       | 287 +++++++++++++++++++++
 8 files changed, 514 insertions(+), 16 deletions(-)

diff --git a/be/src/exec/scan/olap_scanner.cpp 
b/be/src/exec/scan/olap_scanner.cpp
index 342b5febd59..2649300aff4 100644
--- a/be/src/exec/scan/olap_scanner.cpp
+++ b/be/src/exec/scan/olap_scanner.cpp
@@ -491,20 +491,41 @@ Status OlapScanner::_init_tablet_reader_params(
             _tablet_reader_params.enable_mor_value_predicate_pushdown = true;
         }
 
-        // order by table keys optimization for topn
-        // will only read head/tail of data file since it's already sorted by 
keys
-        if (olap_scan_node.__isset.sort_info && 
!olap_scan_node.sort_info.is_asc_order.empty()) {
-            _limit = _local_state->limit_per_scanner();
-            _tablet_reader_params.read_orderby_key = true;
-            if (!olap_scan_node.sort_info.is_asc_order[0]) {
-                _tablet_reader_params.read_orderby_key_reverse = true;
-            }
-            _tablet_reader_params.read_orderby_key_num_prefix_columns =
-                    olap_scan_node.sort_info.is_asc_order.size();
-            _tablet_reader_params.read_orderby_key_limit = _limit;
-
-            if (_tablet_reader_params.read_orderby_key_limit > 0 &&
-                olap_scan_local_state->_storage_no_merge()) {
+        // Skip topn / general-limit storage-layer optimizations when runtime
+        // filters exist.  Late-arriving filters would re-populate _conjuncts
+        // at the scanner level while the storage layer has already committed
+        // to a row budget counted before those filters, causing the scan to
+        // return fewer rows than the limit requires.
+        if (_total_rf_num == 0) {
+            // order by table keys optimization for topn
+            // will only read head/tail of data file since it's already sorted 
by keys
+            if (olap_scan_node.__isset.sort_info &&
+                !olap_scan_node.sort_info.is_asc_order.empty()) {
+                _limit = _local_state->limit_per_scanner();
+                _tablet_reader_params.read_orderby_key = true;
+                if (!olap_scan_node.sort_info.is_asc_order[0]) {
+                    _tablet_reader_params.read_orderby_key_reverse = true;
+                }
+                _tablet_reader_params.read_orderby_key_num_prefix_columns =
+                        olap_scan_node.sort_info.is_asc_order.size();
+                _tablet_reader_params.read_orderby_key_limit = _limit;
+
+                if (_tablet_reader_params.read_orderby_key_limit > 0 &&
+                    olap_scan_local_state->_storage_no_merge()) {
+                    _tablet_reader_params.filter_block_conjuncts = _conjuncts;
+                    _conjuncts.clear();
+                }
+            } else if (_limit > 0 && 
olap_scan_local_state->_storage_no_merge()) {
+                // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW
+                // (non-merge path). Only when topn optimization is NOT active.
+                // NOTE: _limit is the global query limit (TPlanNode.limit), 
not a
+                // per-scanner budget. With N scanners each scanner may read 
up to
+                // _limit rows, so up to N * _limit rows are read in total 
before
+                // the _shared_scan_limit coordinator stops them. This is
+                // acceptable because _shared_scan_limit guarantees 
correctness,
+                // and the over-read is bounded by (N-1) * _limit which is 
small
+                // for typical LIMIT values.
+                _tablet_reader_params.general_read_limit = _limit;
                 _tablet_reader_params.filter_block_conjuncts = _conjuncts;
                 _conjuncts.clear();
             }
diff --git a/be/src/storage/iterator/vcollect_iterator.cpp 
b/be/src/storage/iterator/vcollect_iterator.cpp
index 4065efefb7e..12d12f8c6d0 100644
--- a/be/src/storage/iterator/vcollect_iterator.cpp
+++ b/be/src/storage/iterator/vcollect_iterator.cpp
@@ -92,7 +92,14 @@ void VCollectIterator::init(TabletReader* reader, bool 
ori_data_overlapping, boo
         _topn_limit = _reader->_reader_context.read_orderby_key_limit;
     } else {
         _topn_limit = 0;
-        DCHECK_EQ(_reader->_reader_context.filter_block_conjuncts.size(), 0);
+    }
+
+    // General limit pushdown: only for non-merge path (DUP_KEYS or 
UNIQUE_KEYS with MOW).
+    // The scanner already guards this with _storage_no_merge(), but we also 
check !_merge
+    // here because _merge can be forced true by overlapping data 
(force_merge), in which
+    // case limit pushdown is not safe.
+    if (!_merge && _reader->_reader_context.general_read_limit > 0) {
+        _general_read_limit = _reader->_reader_context.general_read_limit;
     }
 }
 
@@ -248,8 +255,43 @@ Status VCollectIterator::next(Block* block) {
         return _topn_next(block);
     }
 
+    // Fast path: if general limit already reached, return EOF immediately
+    if (_general_read_limit > 0 && _general_rows_returned >= 
_general_read_limit) {
+        return Status::Error<END_OF_FILE>("");
+    }
+
     if (LIKELY(_inner_iter)) {
-        return _inner_iter->next(block);
+        auto st = _inner_iter->next(block);
+        if (UNLIKELY(!st.ok())) {
+            return st;
+        }
+
+        // Apply filter_block_conjuncts that were moved from 
Scanner::_conjuncts.
+        // This must happen BEFORE limit counting so that 
_general_rows_returned
+        // reflects post-filter rows (same pattern as _topn_next).
+        // Intentionally not gated by _general_read_limit > 0:
+        // filter_block_conjuncts is only populated when the general-limit or
+        // topn branches move conjuncts into the storage layer (topn takes
+        // the _topn_next path and never reaches here).
+        if (!_reader->_reader_context.filter_block_conjuncts.empty()) {
+            RETURN_IF_ERROR(VExprContext::filter_block(
+                    _reader->_reader_context.filter_block_conjuncts, block, 
block->columns()));
+        }
+
+        // Enforce general read limit: truncate block if needed
+        if (_general_read_limit > 0) {
+            _general_rows_returned += block->rows();
+            if (_general_rows_returned > _general_read_limit) {
+                // Truncate block to return exactly the remaining rows needed
+                int64_t excess = _general_rows_returned - _general_read_limit;
+                int64_t keep = block->rows() - excess;
+                DCHECK_GT(keep, 0);
+                block->set_num_rows(keep);
+                _general_rows_returned = _general_read_limit;
+            }
+        }
+
+        return Status::OK();
     } else {
         return Status::Error<END_OF_FILE>("");
     }
diff --git a/be/src/storage/iterator/vcollect_iterator.h 
b/be/src/storage/iterator/vcollect_iterator.h
index 4201546c048..2b1731a3503 100644
--- a/be/src/storage/iterator/vcollect_iterator.h
+++ b/be/src/storage/iterator/vcollect_iterator.h
@@ -350,6 +350,12 @@ private:
     bool _topn_eof = false;
     std::vector<RowSetSplits> _rs_splits;
 
+    // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW (non-merge 
path).
+    // When > 0, VCollectIterator will stop reading after returning this many 
rows.
+    int64_t _general_read_limit = -1;
+    // Number of rows already returned to the caller.
+    int64_t _general_rows_returned = 0;
+
     // Hold reader point to access read params, such as fetch conditions.
     TabletReader* _reader = nullptr;
 
diff --git a/be/src/storage/rowset/rowset_reader_context.h 
b/be/src/storage/rowset/rowset_reader_context.h
index e44733367c8..2dde24e0e19 100644
--- a/be/src/storage/rowset/rowset_reader_context.h
+++ b/be/src/storage/rowset/rowset_reader_context.h
@@ -103,6 +103,10 @@ struct RowsetReaderContext {
 
     // When true, push down value predicates for MOR tables
     bool enable_mor_value_predicate_pushdown = false;
+
+    // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW.
+    // Propagated from ReaderParams.general_read_limit.
+    int64_t general_read_limit = -1;
 };
 
 } // namespace doris
diff --git a/be/src/storage/tablet/tablet_reader.cpp 
b/be/src/storage/tablet/tablet_reader.cpp
index 03a637582d1..d0901a8076c 100644
--- a/be/src/storage/tablet/tablet_reader.cpp
+++ b/be/src/storage/tablet/tablet_reader.cpp
@@ -199,6 +199,8 @@ Status TabletReader::_capture_rs_readers(const 
ReaderParams& read_params) {
     _reader_context.all_access_paths = read_params.all_access_paths;
     _reader_context.predicate_access_paths = 
read_params.predicate_access_paths;
 
+    // Propagate general read limit for DUP_KEYS and UNIQUE_KEYS with MOW
+    _reader_context.general_read_limit = read_params.general_read_limit;
     return Status::OK();
 }
 
diff --git a/be/src/storage/tablet/tablet_reader.h 
b/be/src/storage/tablet/tablet_reader.h
index 6f6683bfaa2..4af19b57c0a 100644
--- a/be/src/storage/tablet/tablet_reader.h
+++ b/be/src/storage/tablet/tablet_reader.h
@@ -213,6 +213,11 @@ public:
         std::shared_ptr<segment_v2::AnnTopNRuntime> ann_topn_runtime;
 
         uint64_t condition_cache_digest = 0;
+
+        // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW.
+        // When > 0, the storage layer (VCollectIterator) will stop reading
+        // after returning this many rows. -1 means no limit.
+        int64_t general_read_limit = -1;
     };
 
     TabletReader() = default;
diff --git 
a/regression-test/data/query_p0/limit/test_general_limit_pushdown.out 
b/regression-test/data/query_p0/limit/test_general_limit_pushdown.out
new file mode 100644
index 00000000000..2c9afe75210
--- /dev/null
+++ b/regression-test/data/query_p0/limit/test_general_limit_pushdown.out
@@ -0,0 +1,131 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !dup_basic_limit --
+1      99
+10     90
+2      98
+3      97
+4      96
+5      95
+6      94
+7      93
+8      92
+9      91
+
+-- !dup_filter_limit --
+11     89
+12     88
+13     87
+14     86
+15     85
+16     84
+17     83
+18     82
+19     81
+20     80
+21     79
+22     78
+23     77
+24     76
+25     75
+
+-- !dup_filter_over_limit --
+46     54
+47     53
+48     52
+49     51
+50     50
+
+-- !dup_complex_filter_limit --
+16     84
+17     83
+18     82
+19     81
+20     80
+21     79
+22     78
+23     77
+
+-- !mow_basic_limit --
+1      99
+10     90
+2      98
+3      97
+4      96
+5      95
+6      94
+7      93
+8      92
+9      91
+
+-- !mow_filter_limit --
+11     89
+12     88
+13     87
+14     86
+15     85
+16     84
+17     83
+18     82
+19     81
+20     80
+21     79
+22     78
+23     77
+24     76
+25     75
+
+-- !mow_complex_filter_limit --
+16     84
+17     83
+18     82
+19     81
+20     80
+21     79
+22     78
+23     77
+
+-- !agg_basic_limit --
+1      1       30
+2      2       70
+
+-- !mor_filter_limit --
+11     89
+12     88
+13     87
+14     86
+15     85
+16     84
+17     83
+18     82
+19     81
+20     80
+
+-- !mow_delete_limit --
+21     79
+22     78
+23     77
+24     76
+25     75
+26     74
+27     73
+28     72
+29     71
+30     70
+31     69
+32     68
+33     67
+34     66
+35     65
+
+-- !dup_limit_offset --
+10     90
+11     89
+12     88
+13     87
+14     86
+15     85
+6      94
+7      93
+8      92
+9      91
+
diff --git 
a/regression-test/suites/query_p0/limit/test_general_limit_pushdown.groovy 
b/regression-test/suites/query_p0/limit/test_general_limit_pushdown.groovy
new file mode 100644
index 00000000000..332d9870bce
--- /dev/null
+++ b/regression-test/suites/query_p0/limit/test_general_limit_pushdown.groovy
@@ -0,0 +1,287 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Test general limit pushdown to storage layer for DUP_KEYS and UNIQUE_KEYS 
(MOW).
+// This exercises the non-topn limit path where VCollectIterator enforces
+// general_read_limit with filter_block_conjuncts applied before counting.
+
+suite("test_general_limit_pushdown") {
+
+    // ---- DUP_KEYS table ----
+    sql "DROP TABLE IF EXISTS dup_limit_pushdown"
+    sql """
+        CREATE TABLE dup_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        DUPLICATE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1
+        PROPERTIES ("replication_num" = "1");
+    """
+
+    // Insert 50 rows: k1 in [1..50], k2 = 100 - k1, v1 = 'val_<k1>'
+    StringBuilder sb = new StringBuilder()
+    sb.append("INSERT INTO dup_limit_pushdown VALUES ")
+    for (int i = 1; i <= 50; i++) {
+        if (i > 1) sb.append(",")
+        sb.append("(${i}, ${100 - i}, 'val_${i}')")
+    }
+    sql sb.toString()
+
+    // Basic LIMIT without ORDER BY — exercises general limit pushdown path.
+    // Use order_qt_ to ensure deterministic output ordering.
+    order_qt_dup_basic_limit """
+        SELECT k1, k2 FROM dup_limit_pushdown LIMIT 10
+    """
+
+    // LIMIT with WHERE clause — filter_block_conjuncts must be applied before
+    // limit counting, otherwise we may get fewer rows than requested.
+    // k1 > 10 matches 40 rows, LIMIT 15 should return exactly 15.
+    order_qt_dup_filter_limit """
+        SELECT k1, k2 FROM dup_limit_pushdown WHERE k1 > 10 LIMIT 15
+    """
+
+    // LIMIT larger than matching rows — should return all matching rows.
+    // k1 > 45 matches 5 rows, LIMIT 20 should return all 5.
+    order_qt_dup_filter_over_limit """
+        SELECT k1, k2 FROM dup_limit_pushdown WHERE k1 > 45 LIMIT 20
+    """
+
+    // LIMIT with complex predicate (function-based, may not push into storage 
predicates).
+    // This exercises the filter_block_conjuncts path for predicates that 
remain as conjuncts.
+    order_qt_dup_complex_filter_limit """
+        SELECT k1, k2 FROM dup_limit_pushdown WHERE abs(k1 - 25) < 10 LIMIT 8
+    """
+
+    // ---- UNIQUE_KEYS with MOW table ----
+    sql "DROP TABLE IF EXISTS mow_limit_pushdown"
+    sql """
+        CREATE TABLE mow_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        UNIQUE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1
+        PROPERTIES (
+            "replication_num" = "1",
+            "enable_unique_key_merge_on_write" = "true"
+        );
+    """
+
+    sb = new StringBuilder()
+    sb.append("INSERT INTO mow_limit_pushdown VALUES ")
+    for (int i = 1; i <= 50; i++) {
+        if (i > 1) sb.append(",")
+        sb.append("(${i}, ${100 - i}, 'val_${i}')")
+    }
+    sql sb.toString()
+
+    // Basic LIMIT without ORDER BY on MOW table.
+    order_qt_mow_basic_limit """
+        SELECT k1, k2 FROM mow_limit_pushdown LIMIT 10
+    """
+
+    // LIMIT with WHERE on MOW table.
+    order_qt_mow_filter_limit """
+        SELECT k1, k2 FROM mow_limit_pushdown WHERE k1 > 10 LIMIT 15
+    """
+
+    // LIMIT with complex predicate on MOW table.
+    order_qt_mow_complex_filter_limit """
+        SELECT k1, k2 FROM mow_limit_pushdown WHERE abs(k1 - 25) < 10 LIMIT 8
+    """
+
+    // ---- Verify row count correctness with COUNT ----
+    // These verify the LIMIT returns the expected number of rows,
+    // protecting against the pre-filter counting bug.
+
+    def dup_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1 FROM dup_limit_pushdown WHERE k1 > 10 LIMIT 15
+        ) t
+    """
+    assert dup_count[0][0] == 15 : "DUP_KEYS: expected 15 rows with WHERE 
k1>10 LIMIT 15, got ${dup_count[0][0]}"
+
+    def mow_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1 FROM mow_limit_pushdown WHERE k1 > 10 LIMIT 15
+        ) t
+    """
+    assert mow_count[0][0] == 15 : "MOW: expected 15 rows with WHERE k1>10 
LIMIT 15, got ${mow_count[0][0]}"
+
+    // With complex predicate (abs(k1-25) < 10 => k1 in 16..34, 19 rows; LIMIT 
8 should return 8)
+    def dup_complex_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1 FROM dup_limit_pushdown WHERE abs(k1 - 25) < 10 LIMIT 8
+        ) t
+    """
+    assert dup_complex_count[0][0] == 8 : "DUP_KEYS complex filter: expected 8 
rows, got ${dup_complex_count[0][0]}"
+
+    // ---- AGG_KEYS table (negative test: optimization must NOT apply) ----
+    sql "DROP TABLE IF EXISTS agg_limit_pushdown"
+    sql """
+        CREATE TABLE agg_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 INT SUM NOT NULL
+        )
+        AGGREGATE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1
+        PROPERTIES ("replication_num" = "1");
+    """
+    // Insert duplicate keys so aggregation is required
+    sql "INSERT INTO agg_limit_pushdown VALUES (1, 1, 10), (1, 1, 20), (2, 2, 
30), (2, 2, 40), (3, 3, 50)"
+    // After aggregation: (1,1,30), (2,2,70), (3,3,50)
+    def agg_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2, v1 FROM agg_limit_pushdown LIMIT 2
+        ) t
+    """
+    assert agg_count[0][0] == 2 : "AGG_KEYS: expected 2 rows, got 
${agg_count[0][0]}"
+    order_qt_agg_basic_limit """
+        SELECT k1, k2, v1 FROM agg_limit_pushdown LIMIT 2
+    """
+
+    // ---- MOR UNIQUE_KEYS table (negative test: optimization must NOT apply) 
----
+    sql "DROP TABLE IF EXISTS mor_limit_pushdown"
+    sql """
+        CREATE TABLE mor_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        UNIQUE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1
+        PROPERTIES (
+            "replication_num" = "1",
+            "enable_unique_key_merge_on_write" = "false"
+        );
+    """
+    sb = new StringBuilder()
+    sb.append("INSERT INTO mor_limit_pushdown VALUES ")
+    for (int i = 1; i <= 30; i++) {
+        if (i > 1) sb.append(",")
+        sb.append("(${i}, ${100 - i}, 'val_${i}')")
+    }
+    sql sb.toString()
+    def mor_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM mor_limit_pushdown WHERE k1 > 10 LIMIT 10
+        ) t
+    """
+    assert mor_count[0][0] == 10 : "MOR UNIQUE_KEYS: expected 10 rows, got 
${mor_count[0][0]}"
+    order_qt_mor_filter_limit """
+        SELECT k1, k2 FROM mor_limit_pushdown WHERE k1 > 10 LIMIT 10
+    """
+
+    // ---- MOW with DELETEs ----
+    // Verify __DORIS_DELETE_SIGN__ predicate (in _conjuncts) is correctly
+    // handled after being moved to filter_block_conjuncts.
+    sql "DROP TABLE IF EXISTS mow_delete_limit_pushdown"
+    sql """
+        CREATE TABLE mow_delete_limit_pushdown (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        UNIQUE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 1
+        PROPERTIES (
+            "replication_num" = "1",
+            "enable_unique_key_merge_on_write" = "true"
+        );
+    """
+    sb = new StringBuilder()
+    sb.append("INSERT INTO mow_delete_limit_pushdown VALUES ")
+    for (int i = 1; i <= 50; i++) {
+        if (i > 1) sb.append(",")
+        sb.append("(${i}, ${100 - i}, 'val_${i}')")
+    }
+    sql sb.toString()
+    // Delete rows where k1 <= 20, leaving 30 rows (k1 in 21..50)
+    sql "DELETE FROM mow_delete_limit_pushdown WHERE k1 <= 20"
+    def mow_del_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM mow_delete_limit_pushdown LIMIT 15
+        ) t
+    """
+    assert mow_del_count[0][0] == 15 : "MOW with deletes: expected 15 rows, 
got ${mow_del_count[0][0]}"
+    order_qt_mow_delete_limit """
+        SELECT k1, k2 FROM mow_delete_limit_pushdown LIMIT 15
+    """
+    // With filter + delete: k1 > 30 matches 20 rows (31..50), LIMIT 10
+    def mow_del_filter_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM mow_delete_limit_pushdown WHERE k1 > 30 LIMIT 10
+        ) t
+    """
+    assert mow_del_filter_count[0][0] == 10 : "MOW delete+filter: expected 10 
rows, got ${mow_del_filter_count[0][0]}"
+
+    // ---- Multiple buckets/tablets ----
+    // Exercises per-scanner limit vs global limit coordination with multiple 
scanners.
+    sql "DROP TABLE IF EXISTS dup_multi_bucket_limit"
+    sql """
+        CREATE TABLE dup_multi_bucket_limit (
+            k1 INT NOT NULL,
+            k2 INT NOT NULL,
+            v1 VARCHAR(100) NULL
+        )
+        DUPLICATE KEY(k1, k2)
+        DISTRIBUTED BY HASH(k1) BUCKETS 8
+        PROPERTIES ("replication_num" = "1");
+    """
+    sb = new StringBuilder()
+    sb.append("INSERT INTO dup_multi_bucket_limit VALUES ")
+    for (int i = 1; i <= 200; i++) {
+        if (i > 1) sb.append(",")
+        sb.append("(${i}, ${1000 - i}, 'val_${i}')")
+    }
+    sql sb.toString()
+    def multi_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM dup_multi_bucket_limit LIMIT 20
+        ) t
+    """
+    assert multi_count[0][0] == 20 : "Multi-bucket: expected 20 rows, got 
${multi_count[0][0]}"
+    def multi_filter_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM dup_multi_bucket_limit WHERE k1 > 100 LIMIT 15
+        ) t
+    """
+    assert multi_filter_count[0][0] == 15 : "Multi-bucket filter: expected 15 
rows, got ${multi_filter_count[0][0]}"
+
+    // ---- LIMIT + OFFSET ----
+    // Verify OFFSET interacts correctly with general limit pushdown.
+    def offset_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM dup_limit_pushdown LIMIT 10 OFFSET 5
+        ) t
+    """
+    assert offset_count[0][0] == 10 : "LIMIT+OFFSET: expected 10 rows, got 
${offset_count[0][0]}"
+    order_qt_dup_limit_offset """
+        SELECT k1, k2 FROM dup_limit_pushdown LIMIT 10 OFFSET 5
+    """
+    // OFFSET beyond matching rows: k1 > 45 matches 5 rows, OFFSET 3 LIMIT 10 
should return 2
+    def offset_over_count = sql """
+        SELECT COUNT(*) FROM (
+            SELECT k1, k2 FROM dup_limit_pushdown WHERE k1 > 45 LIMIT 10 
OFFSET 3
+        ) t
+    """
+    assert offset_over_count[0][0] == 2 : "LIMIT+OFFSET over: expected 2 rows, 
got ${offset_over_count[0][0]}"
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [Improvement](scan) update limit pushdown to tablet reader (#61713)

Reply via email to