github-actions[bot] commented on code in PR #63533:
URL: https://github.com/apache/doris/pull/63533#discussion_r3301884953
##########
cloud/src/meta-service/meta_service_resource.cpp:
##########
@@ -1491,6 +1641,115 @@ void
MetaServiceImpl::alter_storage_vault(google::protobuf::RpcController* contr
code = cast_as<ErrCategory::COMMIT>(err);
msg = fmt::format("failed to commit kv txn, err={}", err);
LOG(WARNING) << msg;
+ return;
+ }
+
+ async_notify_refresh_instance(txn_kv_, instance_id, true);
+
+ if (!supports_cascade) {
+ return;
+ }
+
+ if (!instance.has_snapshot_switch_status() ||
+ instance.snapshot_switch_status() == SNAPSHOT_SWITCH_DISABLED) {
+ LOG(INFO) << "snapshot disabled for instance_id=" << instance_id
+ << ", skip cascade updating derived instances after
alter_storage_vault";
+ return;
+ }
+
+ std::vector<std::string> cascade_instance_ids;
+ if (find_cascade_instances(txn_kv_.get(), instance_id,
&cascade_instance_ids) != 0) {
+ LOG(WARNING) << "failed to find derived instances for storage vault
cascade, instance_id="
+ << instance_id;
+ return;
+ }
+
+ for (const auto& cascade_id : cascade_instance_ids) {
+ std::unique_ptr<Transaction> cascade_txn;
+ TxnErrorCode cascade_err = txn_kv_->create_txn(&cascade_txn);
+ if (cascade_err != TxnErrorCode::TXN_OK) {
+ code = cast_as<ErrCategory::CREATE>(cascade_err);
+ msg = fmt::format(
Review Comment:
This reports a derived-instance cascade failure after the root storage-vault
transaction has already been committed above. A transient error here leaves the
root, and possibly earlier descendants, updated while the RPC returns failure
to the caller. Retrying is not safe for operations like renames because the
root no longer has the old vault name, and the same post-commit error pattern
is used by the newly added `alter_obj_store_info` cascade below. Please either
keep post-commit cascade failures best-effort/retriable in the background, or
make the whole operation transactional/compensated before returning an error.
##########
cloud/src/meta-service/meta_service_resource.cpp:
##########
@@ -1845,9 +2159,11 @@ void
MetaServiceImpl::update_ak_sk(google::protobuf::RpcController* controller,
std::unique_ptr<Transaction> cascade_txn;
TxnErrorCode cascade_err = txn_kv_->create_txn(&cascade_txn);
if (cascade_err != TxnErrorCode::TXN_OK) {
- LOG(WARNING) << "failed to create txn for derived instance,
instance_id=" << cascade_id
- << " err=" << cascade_err;
- continue;
+ code = cast_as<ErrCategory::CREATE>(cascade_err);
+ msg = fmt::format("failed to create txn for derived instance,
instance_id={}, err={}",
Review Comment:
This changes the existing AK/SK cascade from best-effort `continue` to
returning an RPC error after the root instance was already committed. If a
derived update fails, clients see failure even though the root update
succeeded; a retry can then hit `ak sk eq original` on the root and still not
repair the skipped descendants. Please preserve the previous
success/best-effort semantics or add a retry/reconciliation mechanism instead
of returning errors after partial commits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]