github-actions[bot] commented on code in PR #62661:
URL: https://github.com/apache/doris/pull/62661#discussion_r3417688818
##########
be/src/cloud/cloud_tablets_channel.cpp:
##########
@@ -64,6 +64,25 @@ Status CloudTabletsChannel::add_batch(const
PTabletWriterAddBlockRequest& reques
return Status::OK();
}
+ if (request.is_receiver_side_random_bucket()) {
+ std::unordered_map<int64_t, DorisVector<uint32_t>>
partition_to_rowidxs;
+
RETURN_IF_ERROR(_build_partition_to_rowidxs_for_receiver_side_random_bucket(
+ request, &partition_to_rowidxs));
+ if (!partition_to_rowidxs.empty()) {
+ std::unordered_set<int64_t> partition_ids;
+ partition_ids.reserve(partition_to_rowidxs.size());
+ for (const auto& [partition_id, _] : partition_to_rowidxs) {
+ partition_ids.insert(partition_id);
+ }
+ {
+ std::lock_guard<std::mutex> l(_tablet_writers_lock);
+ RETURN_IF_ERROR(_init_writers_by_partition_ids(partition_ids));
Review Comment:
This adaptive cloud path initializes every writer in each touched partition
before the receiver chooses the current adaptive tablet. In the normal cloud
path, when `skip_writing_empty_rowset_metadata` is true (the default), only
writers that actually receive rows are `batch_init`'d; untouched writers stay
uninitialized so close goes through `_commit_empty_rowset()` and skips writing
empty rowset metadata. Here a batch with one row for partition P calls
`_init_writers_by_partition_ids(P)`, so all tablet writers for P become
`is_init=true`. At close, writers that never received rows take the normal
initialized `commit_rowset()` path instead of the skip-empty path, creating
empty rowset metadata and rowset builders for every bucket on the receiver.
That negates the cloud lazy/skip-empty optimization and can reintroduce the
per-partition memory/metadata blow-up adaptive routing is trying to avoid.
Please initialize only the current adaptive tablet writer(s) selected for this
request,
leaving the other partition writers uninitialized until they actually receive
rows or close uses the existing empty-rowset handling.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]