Re: [PR] [opt](cloud) cache cluster id per query and drop redundant locks on getBackendId hot path [doris]

via GitHub Mon, 25 May 2026 16:54:45 -0700


github-actions[bot] commented on code in PR #63636:
URL: https://github.com/apache/doris/pull/63636#discussion_r3300371139



##########
fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java:
##########
@@ -4465,6 +4466,21 @@ public TCreatePartitionResult 
createPartition(TCreatePartitionRequest request) t
         List<TTabletLocation> tablets = new ArrayList<>();
         List<TTabletLocation> slaveTablets = new ArrayList<>();
         List<TOlapTablePartition> partitions = Lists.newArrayList();
+        // In cloud mode without a BE endpoint hint, the cluster id resolved 
from ConnectContext
+        // is identical for every tablet below. Resolve once here to avoid 
running the full
+        // CloudReplica.getCurrentClusterId pipeline (ConnectContext lookup, 
priv check, status
+        // check, autoStart, existence check) per tablet across every 
partition and index.
+        String cachedClusterId = null;
+        if (Config.isCloudMode() && !request.isSetBeEndpoint()) {
+            try {

Review Comment:
   This has the same eager-resolution problem for the auto-partition cache 
path: the compute group is resolved before checking `getAutoPartitionInfo()`. 
If all requested partitions are already cached for the txn, the old code 
returned those cached tablet locations without touching 
`CloudReplica.getCurrentClusterId()`; after this change, a cache-only retry can 
fail because the current context/default cluster/auth/status is unavailable 
even though no new backend mapping is needed. Please move the resolution into 
the cache-miss path and reuse it after the first miss; the `replacePartition` 
hoist below needs the same treatment.



##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java:
##########
@@ -780,6 +782,20 @@ private List<TOlapTableLocationParam> createLocation(long 
dbId, OlapTable table)
         TOlapTableLocationParam slaveLocationParam = new 
TOlapTableLocationParam();
         // BE id -> path hash
         Multimap<Long, Long> allBePathsMap = HashMultimap.create();
+        // In cloud mode, the cluster id resolved from ConnectContext is 
identical for every
+        // tablet in this createLocation call. Resolve it ONCE here and reuse 
it for all
+        // tablets below; otherwise CloudReplica.getCurrentClusterId() would 
run its full
+        // pipeline (ConnectContext lookup, priv check, status check, 
autoStart, existence
+        // check, plus the read-lock CAS pair inside each) per tablet -- on 
multi-thousand
+        // tablet queries this was the single dominant FE hotspot.
+        String cachedClusterId = null;
+        if (Config.isCloudMode()) {
+            try {

Review Comment:
   This now resolves the cloud compute group before knowing whether 
`createLocation` needs any tablet backend map. The method still has the 
`partitionIds.isEmpty()` path below that returns empty tablet lists for 
partition-by-function/initial no-partition cases; before this PR that path 
never called `CloudReplica.getCurrentClusterId()`, while now cloud mode can 
fail on missing context/default cluster/auth/manual-shutdown even though no 
tablet location is needed. Please resolve lazily on the first tablet, or at 
least skip this when `partitionIds` is empty, so the no-location path preserves 
its previous behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [opt](cloud) cache cluster id per query and drop redundant locks on getBackendId hot path [doris]

Reply via email to