Copilot commented on code in PR #60952:
URL: https://github.com/apache/doris/pull/60952#discussion_r2871231339
##########
be/src/runtime_filter/runtime_filter_producer.h:
##########
@@ -131,8 +131,11 @@ class RuntimeFilterProducer : public RuntimeFilter {
_wrapper = wrapper;
}
- std::shared_ptr<RuntimeFilterWrapper> detect_in_filter() {
- if (_has_remote_target) {
+ std::shared_ptr<RuntimeFilterWrapper> detect_local_in_filter(RuntimeState*
state) {
+ // need merge mean this filter not pure local
+ // the data not directly colocated with scan node
+ // so that can not enable local in filter optimization
+ if (_need_do_merge(state)) {
return nullptr;
}
std::unique_lock<std::recursive_mutex> l(_rmtx);
Review Comment:
`detect_local_in_filter()` calls `_need_do_merge(state)` before acquiring
`_rmtx`, but `_need_do_merge()` dereferences `_wrapper` and reads
`_has_remote_target`. This breaks the class' locking discipline (all other
`_wrapper` accesses are under `_rmtx`) and could race with `publish()`
resetting `_wrapper` if these methods are ever called concurrently. Consider
taking `_rmtx` before calling `_need_do_merge`, or make `_need_do_merge` handle
locking / work off a local copy of `_wrapper` (and guard against `_wrapper ==
nullptr`).
##########
regression-test/suites/query_p0/join/test_cte_exists/test_cte_exists.groovy:
##########
@@ -0,0 +1,76 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_cte_exists") {
+ def tableName =
"table_20_50_undef_partitions2_keys3_properties4_distributed_by5"
+
+ sql """ DROP TABLE IF EXISTS ${tableName} """
+
+ sql "set enable_left_semi_direct_return_opt = true; "
+
+ sql "set parallel_pipeline_task_number=16;"
+
+ sql "set runtime_filter_max_in_num=10;"
+
+ sql "set runtime_filter_type = 'IN';"
Review Comment:
The regression case is described as reproducing a wrong result with **local
shuffle**, but this suite never explicitly enables local shuffle (e.g. `set
enable_local_shuffle=true` or `SET_VAR(enable_local_shuffle=true)` on the
query). If local shuffle is disabled by default in some environments, this test
may pass without exercising the buggy path. Consider enabling (and optionally
forcing) local shuffle for the query, and resetting it after the assertion to
avoid leaking session settings.
##########
be/src/runtime_filter/runtime_filter_wrapper.cpp:
##########
@@ -180,7 +180,7 @@ Status RuntimeFilterWrapper::merge(const
RuntimeFilterWrapper* other) {
_hybrid_set->insert(other->_hybrid_set.get());
if (_max_in_num >= 0 && _hybrid_set->size() > _max_in_num) {
_hybrid_set->clear();
- set_state(State::DISABLED, fmt::format("reach max in num: {}",
_max_in_num));
+ set_state(State::DISABLED, fmt::format("merge reach max in num:
{}", _max_in_num));
Review Comment:
The new disable reason string "merge reach max in num" is still a bit
unclear/unnatural and may be surfaced in logs/profiles. Consider rephrasing to
something like "merge reached max in num" (or mention the setting name) so it’s
obvious the filter was disabled because the merged IN-set exceeded the
threshold.
```suggestion
set_state(State::DISABLED,
fmt::format("merge reached max IN num threshold: {}",
_max_in_num));
```
##########
be/src/runtime_filter/runtime_filter_producer.h:
##########
@@ -157,6 +160,15 @@ class RuntimeFilterProducer : public RuntimeFilter {
}
}
+ bool _need_do_merge(RuntimeState* state) {
+ // two case we need do local merge:
Review Comment:
Minor grammar in the new comments: "need merge mean" / "so that can not" /
"two case" are a bit hard to read. Tweaking these to standard phrasing (e.g.,
"needing a merge means...", "cannot", "two cases") would make the intent
clearer for future maintainers.
```suggestion
// two cases where we need to do a local merge:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]