github-actions[bot] commented on code in PR #63636:
URL: https://github.com/apache/doris/pull/63636#discussion_r3300371139
##########
fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java:
##########
@@ -4465,6 +4466,21 @@ public TCreatePartitionResult
createPartition(TCreatePartitionRequest request) t
List<TTabletLocation> tablets = new ArrayList<>();
List<TTabletLocation> slaveTablets = new ArrayList<>();
List<TOlapTablePartition> partitions = Lists.newArrayList();
+ // In cloud mode without a BE endpoint hint, the cluster id resolved
from ConnectContext
+ // is identical for every tablet below. Resolve once here to avoid
running the full
+ // CloudReplica.getCurrentClusterId pipeline (ConnectContext lookup,
priv check, status
+ // check, autoStart, existence check) per tablet across every
partition and index.
+ String cachedClusterId = null;
+ if (Config.isCloudMode() && !request.isSetBeEndpoint()) {
+ try {
Review Comment:
This has the same eager-resolution problem for the auto-partition cache
path: the compute group is resolved before checking `getAutoPartitionInfo()`.
If all requested partitions are already cached for the txn, the old code
returned those cached tablet locations without touching
`CloudReplica.getCurrentClusterId()`; after this change, a cache-only retry can
fail because the current context/default cluster/auth/status is unavailable
even though no new backend mapping is needed. Please move the resolution into
the cache-miss path and reuse it after the first miss; the `replacePartition`
hoist below needs the same treatment.
##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java:
##########
@@ -780,6 +782,20 @@ private List<TOlapTableLocationParam> createLocation(long
dbId, OlapTable table)
TOlapTableLocationParam slaveLocationParam = new
TOlapTableLocationParam();
// BE id -> path hash
Multimap<Long, Long> allBePathsMap = HashMultimap.create();
+ // In cloud mode, the cluster id resolved from ConnectContext is
identical for every
+ // tablet in this createLocation call. Resolve it ONCE here and reuse
it for all
+ // tablets below; otherwise CloudReplica.getCurrentClusterId() would
run its full
+ // pipeline (ConnectContext lookup, priv check, status check,
autoStart, existence
+ // check, plus the read-lock CAS pair inside each) per tablet -- on
multi-thousand
+ // tablet queries this was the single dominant FE hotspot.
+ String cachedClusterId = null;
+ if (Config.isCloudMode()) {
+ try {
Review Comment:
This now resolves the cloud compute group before knowing whether
`createLocation` needs any tablet backend map. The method still has the
`partitionIds.isEmpty()` path below that returns empty tablet lists for
partition-by-function/initial no-partition cases; before this PR that path
never called `CloudReplica.getCurrentClusterId()`, while now cloud mode can
fail on missing context/default cluster/auth/manual-shutdown even though no
tablet location is needed. Please resolve lazily on the first tablet, or at
least skip this when `partitionIds` is empty, so the no-location path preserves
its previous behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]