Copilot commented on code in PR #60248:
URL: https://github.com/apache/doris/pull/60248#discussion_r2729008328


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -148,20 +170,52 @@ private static AwsCredentialsProvider 
getAwsCredencialsProviderV1(URI endpoint,
         }
 
         if (!Strings.isNullOrEmpty(roleArn)) {
-            StsClient stsClient = StsClient.builder()
-                    .credentialsProvider(DefaultCredentialsProvider.create())
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
-                        if (!Strings.isNullOrEmpty(externalId)) {
-                            builder.externalId(externalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by roleArn:externalId
+            String cacheKey = "v1:" + roleArn + ":" + (externalId != null ? 
externalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        
.credentialsProvider(getCachedDefaultCredentialsProvider())
+                        .build();

Review Comment:
   The StsClient instances created within the cached credential providers are 
never explicitly closed, potentially leading to resource leaks. The StsClient 
implements AutoCloseable and should be closed to release its underlying HTTP 
client resources. Consider managing the lifecycle of StsClient instances 
properly, either by caching them separately or ensuring they are closed when 
the credential provider is no longer needed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/common/AwsCredentialsProviderFactory.java:
##########
@@ -33,19 +33,38 @@
 
 import java.util.ArrayList;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 
 public final class AwsCredentialsProviderFactory {
 
     private AwsCredentialsProviderFactory() {
     }
 
+    // Cache for AWS SDK V2 credential providers to prevent thread leak
+    // Each credential provider type creates internal ScheduledExecutorService 
threads
+    // for credential refresh. Without caching, every call to createV2() 
creates new
+    // threads that are never cleaned up, causing memory exhaustion.
+    // Key format: "mode:includeAnonymous" for DEFAULT mode, "mode" for others
+    private static final ConcurrentMap<String, AwsCredentialsProvider> 
V2_PROVIDER_CACHE =
+            new ConcurrentHashMap<>();

Review Comment:
   The credential provider caches use ConcurrentHashMap with no eviction 
policy, which means cached providers will remain in memory indefinitely. This 
could lead to memory accumulation over time if many different credential 
configurations are used throughout the application lifecycle. Consider 
implementing a cache eviction strategy using a bounded cache implementation 
like Caffeine or Guava's Cache with time-based or size-based eviction.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -334,20 +362,25 @@ private AwsCredentialsProvider 
getAwsCredentialsProviderV2() {
             return credentialsProvider;
         }
         if (StringUtils.isNotBlank(s3IAMRole)) {
-            StsClient stsClient = StsClient.builder()
-                    .region(Region.of(region))
-                    
.credentialsProvider(AwsCredentialsProviderFactory.createV2(
-                            awsCredentialsProviderMode,
-                            false))
-                    .build();
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
-                        if (StringUtils.isNotBlank(s3ExternalId)) {
-                            builder.externalId(s3ExternalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by region:roleArn:externalId:mode
+            String cacheKey = "v2:" + region + ":" + s3IAMRole + ":"
+                    + (s3ExternalId != null ? s3ExternalId : "") + ":" + 
awsCredentialsProviderMode;
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .region(Region.of(region))
+                        
.credentialsProvider(AwsCredentialsProviderFactory.createV2(
+                                awsCredentialsProviderMode,
+                                false))
+                        .build();

Review Comment:
   The StsClient instances created within the cached credential providers are 
never explicitly closed, potentially leading to resource leaks. The StsClient 
implements AutoCloseable and should be closed to release its underlying HTTP 
client resources. Consider managing the lifecycle of StsClient instances 
properly, either by caching them separately or ensuring they are closed when 
the credential provider is no longer needed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -50,13 +50,36 @@
 import java.util.Objects;
 import java.util.Optional;
 import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 import java.util.regex.Pattern;
 import java.util.stream.Stream;
 
 public class S3Properties extends AbstractS3CompatibleProperties {
 
     private static final Logger LOG = LogManager.getLogger(S3Properties.class);
 
+    // Cached credential providers to prevent thread leak.
+    // Each credential provider creates internal ScheduledExecutorService 
threads for credential refresh.
+    // Without caching, every call creates new threads that are never cleaned 
up.
+    private static volatile InstanceProfileCredentialsProvider 
cachedInstanceProfileProvider = null;
+    private static final Object LOCK = new Object();
+
+    // Cache for role-based credential providers, keyed by 
region:roleArn:externalId:mode
+    private static final ConcurrentMap<String, AwsCredentialsProvider> 
ROLE_CREDENTIALS_CACHE =
+            new ConcurrentHashMap<>();

Review Comment:
   The credential provider caches use ConcurrentHashMap with no eviction 
policy, which means cached providers will remain in memory indefinitely. This 
could lead to memory accumulation over time if many different credential 
configurations are used throughout the application lifecycle. Consider 
implementing a cache eviction strategy using a bounded cache implementation 
like Caffeine or Guava's Cache with time-based or size-based eviction.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -58,12 +58,26 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 public class S3Util {
     private static final Logger LOG = LogManager.getLogger(Util.class);
 
+    // Cached credential providers to prevent thread leak
+    // Each credential provider creates internal ScheduledExecutorService 
threads for credential refresh.
+    // Without caching, every call creates new threads that are never cleaned 
up.
+    private static volatile AwsCredentialsProvider cachedDefaultChainProvider 
= null;
+    private static volatile DefaultCredentialsProvider 
cachedDefaultCredentialsProvider = null;
+    private static volatile AwsCredentialsProvider cachedV2ChainProvider = 
null;
+    private static final Object LOCK = new Object();
+
+    // Cache for role-based credential providers, keyed by roleArn:externalId
+    private static final ConcurrentMap<String, AwsCredentialsProvider> 
ROLE_CREDENTIALS_CACHE =
+            new ConcurrentHashMap<>();

Review Comment:
   The credential provider caches use ConcurrentHashMap with no eviction 
policy, which means cached providers will remain in memory indefinitely. This 
could lead to memory accumulation over time if many different credential 
configurations are used throughout the application lifecycle. Consider 
implementing a cache eviction strategy using a bounded cache implementation 
like Caffeine or Guava's Cache with time-based or size-based eviction.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -148,20 +170,52 @@ private static AwsCredentialsProvider 
getAwsCredencialsProviderV1(URI endpoint,
         }
 
         if (!Strings.isNullOrEmpty(roleArn)) {
-            StsClient stsClient = StsClient.builder()
-                    .credentialsProvider(DefaultCredentialsProvider.create())
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
-                        if (!Strings.isNullOrEmpty(externalId)) {
-                            builder.externalId(externalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by roleArn:externalId
+            String cacheKey = "v1:" + roleArn + ":" + (externalId != null ? 
externalId : "");

Review Comment:
   The cache key for role-based credentials is missing the region parameter, 
which could cause credentials from different regions to be incorrectly shared. 
The cache key should include the region to prevent credential providers 
configured for different AWS regions from sharing the same cached instance. 
This could lead to authentication failures when the same roleArn is used across 
multiple regions.
   ```suggestion
               // Cache role-based credentials by region:roleArn:externalId
               String cacheKey = "v1:" + (region != null ? region : "") + ":" + 
roleArn + ":"
                       + (externalId != null ? externalId : "");
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -310,19 +333,24 @@ private AwsCredentialsProvider 
getAwsCredentialsProviderV1() {
             return credentialsProvider;
         }
         if (StringUtils.isNotBlank(s3IAMRole)) {
-            StsClient stsClient = StsClient.builder()
-                    .region(Region.of(region))
-                    
.credentialsProvider(InstanceProfileCredentialsProvider.create())
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
-                        if (StringUtils.isNotBlank(s3ExternalId)) {
-                            builder.externalId(s3ExternalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by region:roleArn:externalId:mode
+            String cacheKey = "v1:" + region + ":" + s3IAMRole + ":"
+                    + (s3ExternalId != null ? s3ExternalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .region(Region.of(region))
+                        
.credentialsProvider(getCachedInstanceProfileProvider())
+                        .build();
+
+                return StsAssumeRoleCredentialsProvider.builder()
+                        .stsClient(stsClient)
+                        .refreshRequest(builder -> {
+                            
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
+                            if (StringUtils.isNotBlank(s3ExternalId)) {
+                                builder.externalId(s3ExternalId);
+                            }
+                        }).build();
+            });
         }
         // For anonymous access (no credentials required)
         return AnonymousCredentialsProvider.create();

Review Comment:
   AnonymousCredentialsProvider.create() should also be cached since it creates 
an instance that could potentially be reused. While anonymous credentials don't 
require refresh threads like other providers, caching would be more consistent 
with the overall pattern and could prevent unnecessary object creation.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/metastore/AWSGlueMetaStoreBaseProperties.java:
##########
@@ -176,36 +204,29 @@ public AwsCredentialsProvider getAwsCredentialsProvider() 
{
                         glueSessionToken));
             }
         }
-        // If IAM role is configured, use STS AssumeRole
+        // If IAM role is configured, use STS AssumeRole with cached providers
         if (StringUtils.isNotBlank(glueIAMRole)) {
-            StsClient stsClient = StsClient.builder()
-                    .region(Region.of(glueRegion))
-                    .credentialsProvider(AwsCredentialsProviderChain.of(
-                            WebIdentityTokenFileCredentialsProvider.create(),
-                            ContainerCredentialsProvider.create(),
-                            InstanceProfileCredentialsProvider.create(),
-                            SystemPropertyCredentialsProvider.create(),
-                            EnvironmentVariableCredentialsProvider.create(),
-                            ProfileCredentialsProvider.create()))
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(glueIAMRole).roleSessionName("aws-glue-java-fe");
-                        if (StringUtils.isNotBlank(glueExternalId)) {
-                            builder.externalId(glueExternalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by region:roleArn:externalId
+            String cacheKey = glueRegion + ":" + glueIAMRole + ":"
+                    + (glueExternalId != null ? glueExternalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .region(Region.of(glueRegion))
+                        .credentialsProvider(getCachedDefaultChainProvider())
+                        .build();

Review Comment:
   The StsClient instances created within the cached credential providers are 
never explicitly closed, potentially leading to resource leaks. The StsClient 
implements AutoCloseable and should be closed to release its underlying HTTP 
client resources. Consider managing the lifecycle of StsClient instances 
properly, either by caching them separately or ensuring they are closed when 
the credential provider is no longer needed.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -177,32 +231,24 @@ private static AwsCredentialsProvider 
getAwsCredencialsProviderV2(URI endpoint,
         }
 
         if (!Strings.isNullOrEmpty(roleArn)) {
-            StsClient stsClient = StsClient.builder()
-                    .credentialsProvider(AwsCredentialsProviderChain.of(
-                            WebIdentityTokenFileCredentialsProvider.create(),
-                            ContainerCredentialsProvider.create(),
-                            InstanceProfileCredentialsProvider.create(),
-                            SystemPropertyCredentialsProvider.create(),
-                            EnvironmentVariableCredentialsProvider.create(),
-                            ProfileCredentialsProvider.create()))
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
-                        if (!Strings.isNullOrEmpty(externalId)) {
-                            builder.externalId(externalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by roleArn:externalId
+            String cacheKey = "v2:" + roleArn + ":" + (externalId != null ? 
externalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .credentialsProvider(getCachedV2ChainProvider())
+                        .build();

Review Comment:
   The StsClient instances created within the cached credential providers are 
never explicitly closed, potentially leading to resource leaks. The StsClient 
implements AutoCloseable and should be closed to release its underlying HTTP 
client resources. Consider managing the lifecycle of StsClient instances 
properly, either by caching them separately or ensuring they are closed when 
the credential provider is no longer needed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -310,19 +333,24 @@ private AwsCredentialsProvider 
getAwsCredentialsProviderV1() {
             return credentialsProvider;
         }
         if (StringUtils.isNotBlank(s3IAMRole)) {
-            StsClient stsClient = StsClient.builder()
-                    .region(Region.of(region))
-                    
.credentialsProvider(InstanceProfileCredentialsProvider.create())
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
-                        if (StringUtils.isNotBlank(s3ExternalId)) {
-                            builder.externalId(s3ExternalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by region:roleArn:externalId:mode
+            String cacheKey = "v1:" + region + ":" + s3IAMRole + ":"
+                    + (s3ExternalId != null ? s3ExternalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .region(Region.of(region))
+                        
.credentialsProvider(getCachedInstanceProfileProvider())
+                        .build();

Review Comment:
   The StsClient instances created within the cached credential providers are 
never explicitly closed, potentially leading to resource leaks. The StsClient 
implements AutoCloseable and should be closed to release its underlying HTTP 
client resources. Consider managing the lifecycle of StsClient instances 
properly, either by caching them separately or ensuring they are closed when 
the credential provider is no longer needed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/metastore/AWSGlueMetaStoreBaseProperties.java:
##########
@@ -40,10 +40,38 @@
 import 
software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider;
 
 import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 public class AWSGlueMetaStoreBaseProperties {
+    // Cached credential providers to prevent thread leak.
+    // Each credential provider creates internal ScheduledExecutorService 
threads for credential refresh.
+    // Without caching, every call creates new threads that are never cleaned 
up.
+    private static volatile AwsCredentialsProvider cachedDefaultChainProvider 
= null;
+    private static final Object LOCK = new Object();
+
+    // Cache for role-based credential providers, keyed by 
region:roleArn:externalId
+    private static final ConcurrentMap<String, AwsCredentialsProvider> 
ROLE_CREDENTIALS_CACHE =
+            new ConcurrentHashMap<>();

Review Comment:
   The credential provider caches use ConcurrentHashMap with no eviction 
policy, which means cached providers will remain in memory indefinitely. This 
could lead to memory accumulation over time if many different credential 
configurations are used throughout the application lifecycle. Consider 
implementing a cache eviction strategy using a bounded cache implementation 
like Caffeine or Guava's Cache with time-based or size-based eviction.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -177,32 +231,24 @@ private static AwsCredentialsProvider 
getAwsCredencialsProviderV2(URI endpoint,
         }
 
         if (!Strings.isNullOrEmpty(roleArn)) {
-            StsClient stsClient = StsClient.builder()
-                    .credentialsProvider(AwsCredentialsProviderChain.of(
-                            WebIdentityTokenFileCredentialsProvider.create(),
-                            ContainerCredentialsProvider.create(),
-                            InstanceProfileCredentialsProvider.create(),
-                            SystemPropertyCredentialsProvider.create(),
-                            EnvironmentVariableCredentialsProvider.create(),
-                            ProfileCredentialsProvider.create()))
-                    .build();
-
-            return StsAssumeRoleCredentialsProvider.builder()
-                    .stsClient(stsClient)
-                    .refreshRequest(builder -> {
-                        
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
-                        if (!Strings.isNullOrEmpty(externalId)) {
-                            builder.externalId(externalId);
-                        }
-                    }).build();
+            // Cache role-based credentials by roleArn:externalId
+            String cacheKey = "v2:" + roleArn + ":" + (externalId != null ? 
externalId : "");
+            return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+                StsClient stsClient = StsClient.builder()
+                        .credentialsProvider(getCachedV2ChainProvider())
+                        .build();

Review Comment:
   The cache key for role-based credentials is missing the region parameter, 
which could cause credentials from different regions to be incorrectly shared. 
The cache key should include the region to prevent credential providers 
configured for different AWS regions from sharing the same cached instance. 
This could lead to authentication failures when the same roleArn is used across 
multiple regions.
   ```suggestion
               // Cache role-based credentials by roleArn:externalId:region
               String cacheKey = "v2:" + roleArn + ":" + (externalId != null ? 
externalId : "")
                       + ":" + (region != null ? region : "");
               return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
                   StsClient.Builder stsBuilder = StsClient.builder()
                           .credentialsProvider(getCachedV2ChainProvider());
                   if (!Strings.isNullOrEmpty(region)) {
                       stsBuilder = stsBuilder.region(Region.of(region));
                   }
                   StsClient stsClient = stsBuilder.build();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to