Copilot commented on code in PR #61713:
URL: https://github.com/apache/doris/pull/61713#discussion_r2986605339


##########
be/src/storage/rowset/beta_rowset_reader.cpp:
##########
@@ -116,6 +116,7 @@ Status 
BetaRowsetReader::get_segment_iterators(RowsetReaderContext* read_context
     _read_options.version = _rowset->version();
     _read_options.tablet_id = _rowset->rowset_meta()->tablet_id();
     _read_options.topn_limit = _topn_limit;
+    _read_options.general_read_limit = _read_context->general_read_limit;

Review Comment:
   `StorageReadOptions::general_read_limit` is introduced and propagated down 
to `_read_options` here, but there is currently no consumer of this option 
anywhere in storage/segment iterators (searching the repo shows it is only 
written, never read). This adds API surface without behavior change and can 
mislead future readers; either wire it into the segment/rowset iterator 
implementations to actually stop IO early, or remove the unused option until 
it’s implemented.
   ```suggestion
   
   ```



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -94,6 +94,16 @@ void VCollectIterator::init(TabletReader* reader, bool 
ori_data_overlapping, boo
         _topn_limit = 0;
         DCHECK_EQ(_reader->_reader_context.filter_block_conjuncts.size(), 0);
     }
+
+    // General limit pushdown: only for READER_QUERY on non-merge path
+    // (DUP_KEYS or UNIQUE_KEYS with MOW)
+    if (!_merge && _reader->_reader_type == ReaderType::READER_QUERY &&
+        _reader->_reader_context.general_read_limit > 0 &&
+        (_reader->_tablet->keys_type() == KeysType::DUP_KEYS ||
+         (_reader->_tablet->keys_type() == KeysType::UNIQUE_KEYS &&
+          _reader->_tablet->enable_unique_key_merge_on_write()))) {
+        _general_read_limit = _reader->_reader_context.general_read_limit;
+    }

Review Comment:
   `VCollectIterator::init()` sets `_general_read_limit` only when the 
condition matches, but never resets `_general_read_limit` / 
`_general_rows_returned` when the condition does NOT match. If the same 
`VCollectIterator` instance is re-initialized (e.g. a `BlockReader` reused 
across scan ranges), a previously set limit can leak into the next scan and 
cause early EOF / truncated results. Reset these members to defaults at the 
start of `init()` (and reset `_general_rows_returned` whenever a new limit is 
applied).



##########
be/src/storage/iterators.h:
##########
@@ -152,6 +152,10 @@ class StorageReadOptions {
     std::unordered_map<int32_t, ColumnPtr> sparse_column_cache;
 
     uint64_t condition_cache_digest = 0;
+
+    // General limit pushdown for DUP_KEYS and UNIQUE_KEYS with MOW.
+    // Propagated from RowsetReaderContext.general_read_limit.
+    int64_t general_read_limit = -1;

Review Comment:
   This new `general_read_limit` field is currently not used by any downstream 
code (repo-wide search shows it is only assigned, never read). Keeping an 
unused knob in `StorageReadOptions` increases maintenance cost and suggests a 
pushdown that doesn’t actually happen; either implement enforcement in the 
relevant iterators (e.g. `SegmentIterator`/`RowwiseIterator`) or remove it 
until enforcement is in place.
   ```suggestion
   
   ```



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -248,8 +258,31 @@ Status VCollectIterator::next(Block* block) {
         return _topn_next(block);
     }
 
+    // Fast path: if general limit already reached, return EOF immediately
+    if (_general_read_limit > 0 && _general_rows_returned >= 
_general_read_limit) {
+        return Status::Error<END_OF_FILE>("");
+    }
+
     if (LIKELY(_inner_iter)) {
-        return _inner_iter->next(block);
+        auto st = _inner_iter->next(block);
+        if (UNLIKELY(!st.ok())) {
+            return st;
+        }
+
+        // Enforce general read limit: truncate block if needed
+        if (_general_read_limit > 0) {
+            _general_rows_returned += block->rows();
+            if (_general_rows_returned > _general_read_limit) {
+                // Truncate block to return exactly the remaining rows needed
+                int64_t excess = _general_rows_returned - _general_read_limit;
+                int64_t keep = block->rows() - excess;
+                DCHECK_GT(keep, 0);
+                block->set_num_rows(keep);
+                _general_rows_returned = _general_read_limit;
+            }
+        }
+
+        return Status::OK();

Review Comment:
   The new general-limit enforcement in `VCollectIterator::next()` changes scan 
termination/truncation behavior but doesn’t appear to be covered by unit tests. 
Please add a BE test that exercises DUP_KEYS and UNIQUE_KEYS(MOW) non-merge 
reads with a small limit, verifying that blocks are truncated to the remaining 
rows and that subsequent calls return EOF (and also that `general_read_limit <= 
0` keeps existing behavior).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to