Copilot commented on code in PR #61776:
URL: https://github.com/apache/doris/pull/61776#discussion_r2994467348
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -436,6 +453,86 @@ public Status globList(String remotePath, List<RemoteFile>
result, boolean fileN
return st;
}
+ /**
+ * Get file metadata using getProperties requests for deterministic paths.
+ * This avoids requiring list permission when only read permission is
granted.
+ *
+ * @param bucket Azure container name
+ * @param keyPattern The key pattern (may contain {..} brace or [...]
bracket patterns but no wildcards)
+ * @param result List to store matching RemoteFile objects
+ * @param fileNameOnly If true, only store file names; otherwise store
full paths
+ * @param startTime Start time for logging duration
+ * @return Status if successful, null if should fall back to listing
+ */
+ private Status globListByGetProperties(String bucket, String keyPattern,
+ List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+ try {
+ // First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
+ expandedPattern = S3Util.extendGlobs(expandedPattern);
+ List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
+
+ // Fall back to listing if too many paths to avoid overwhelming
Azure with requests
+ // Controlled by config: s3_head_request_max_paths
+ if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+ LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
+ expandedPaths.size(),
Config.s3_head_request_max_paths);
+ return null;
+ }
Review Comment:
`s3_head_request_max_paths` is checked only after expanding patterns into
the full `expandedPaths` list, so extremely large deterministic patterns can
still consume significant CPU/memory during expansion before triggering the
fallback. Consider adding short-circuiting/early-abort expansion once the
configured max is exceeded.
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -718,6 +741,91 @@ private GlobListResult globListInternal(String remotePath,
List<RemoteFile> resu
}
}
+ /**
+ * Get file metadata using HEAD requests for deterministic paths.
+ * This avoids requiring ListBucket permission when only GetObject
permission is granted.
+ *
+ * @param bucket S3 bucket name
+ * @param keyPattern The key pattern (may contain {..} brace or [...]
bracket patterns but no wildcards)
+ * @param result List to store matching RemoteFile objects
+ * @param fileNameOnly If true, only store file names; otherwise store
full S3 paths
+ * @param startTime Start time for logging duration
+ * @return GlobListResult if successful, null if should fall back to
listing
+ */
+ private GlobListResult globListByHeadRequests(String bucket, String
keyPattern,
+ List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+ try {
+ // First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
+ expandedPattern = S3Util.extendGlobs(expandedPattern);
+ List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
+
+ // Fall back to listing if too many paths to avoid overwhelming S3
with HEAD requests
+ // Controlled by config: s3_head_request_max_paths
+ if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+ LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
+ expandedPaths.size(),
Config.s3_head_request_max_paths);
+ return null;
+ }
Review Comment:
`s3_head_request_max_paths` is checked only after fully expanding
bracket/range/brace patterns. For very large deterministic patterns, expansion
itself can be expensive (CPU/memory) before you ever hit the limit. Consider
adding an expansion routine that can short-circuit once the max is exceeded (or
pre-compute/estimate expansion cardinality) so the config actually protects the
system.
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -436,6 +453,86 @@ public Status globList(String remotePath, List<RemoteFile>
result, boolean fileN
return st;
}
+ /**
+ * Get file metadata using getProperties requests for deterministic paths.
+ * This avoids requiring list permission when only read permission is
granted.
+ *
+ * @param bucket Azure container name
+ * @param keyPattern The key pattern (may contain {..} brace or [...]
bracket patterns but no wildcards)
+ * @param result List to store matching RemoteFile objects
+ * @param fileNameOnly If true, only store file names; otherwise store
full paths
+ * @param startTime Start time for logging duration
+ * @return Status if successful, null if should fall back to listing
+ */
+ private Status globListByGetProperties(String bucket, String keyPattern,
+ List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+ try {
+ // First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
+ expandedPattern = S3Util.extendGlobs(expandedPattern);
+ List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
+
+ // Fall back to listing if too many paths to avoid overwhelming
Azure with requests
+ // Controlled by config: s3_head_request_max_paths
+ if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+ LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
+ expandedPaths.size(),
Config.s3_head_request_max_paths);
+ return null;
+ }
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Using getProperties requests for deterministic path
pattern, expanded to {} paths",
+ expandedPaths.size());
+ }
+
+ BlobContainerClient containerClient =
getClient().getBlobContainerClient(bucket);
+ long matchCnt = 0;
+ for (String key : expandedPaths) {
+ String fullPath = constructS3Path(key, bucket);
+ try {
+ BlobClient blobClient = containerClient.getBlobClient(key);
+ BlobProperties props = blobClient.getProperties();
+
+ matchCnt++;
+ RemoteFile remoteFile = new RemoteFile(
+ fileNameOnly ?
Paths.get(key).getFileName().toString() : fullPath,
+ true, // isFile
+ props.getBlobSize(),
+ props.getBlobSize(),
+ props.getLastModified() != null
+ ? props.getLastModified().toEpochSecond()
: 0
+ );
+ result.add(remoteFile);
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("getProperties success for {}: size={}",
fullPath, props.getBlobSize());
+ }
+ } catch (BlobStorageException e) {
+ if (e.getStatusCode() == HttpStatus.SC_NOT_FOUND
+ ||
BlobErrorCode.BLOB_NOT_FOUND.equals(e.getErrorCode())) {
+ // File does not exist, skip it (this is expected for
some expanded patterns)
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("File does not exist (skipped): {}",
fullPath);
+ }
+ } else {
+ throw e;
+ }
+ }
+ }
+
+ if (LOG.isDebugEnabled()) {
+ long duration = System.nanoTime() - startTime;
+ LOG.debug("Deterministic path getProperties requests: checked
{} paths, found {} files, took {} ms",
+ expandedPaths.size(), matchCnt, duration / 1000 /
1000);
+ }
+
+ return Status.OK;
+ } catch (Exception e) {
+ LOG.warn("Failed to use getProperties requests, falling back to
listing: {}", e.getMessage());
+ return null;
+ }
Review Comment:
If an exception occurs after some successful getProperties calls, this
method returns null to fall back to LIST but leaves already-added entries in
`result`, which can lead to duplicates/partial results when the listing path
runs. Consider building results in a temporary list and only appending on
success (or rollback `result` before returning null).
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -718,6 +741,91 @@ private GlobListResult globListInternal(String remotePath,
List<RemoteFile> resu
}
}
+ /**
+ * Get file metadata using HEAD requests for deterministic paths.
+ * This avoids requiring ListBucket permission when only GetObject
permission is granted.
+ *
+ * @param bucket S3 bucket name
+ * @param keyPattern The key pattern (may contain {..} brace or [...]
bracket patterns but no wildcards)
+ * @param result List to store matching RemoteFile objects
+ * @param fileNameOnly If true, only store file names; otherwise store
full S3 paths
+ * @param startTime Start time for logging duration
+ * @return GlobListResult if successful, null if should fall back to
listing
+ */
+ private GlobListResult globListByHeadRequests(String bucket, String
keyPattern,
+ List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+ try {
+ // First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
+ expandedPattern = S3Util.extendGlobs(expandedPattern);
+ List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
+
+ // Fall back to listing if too many paths to avoid overwhelming S3
with HEAD requests
+ // Controlled by config: s3_head_request_max_paths
+ if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+ LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
+ expandedPaths.size(),
Config.s3_head_request_max_paths);
+ return null;
+ }
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Using HEAD requests for deterministic path pattern,
expanded to {} paths",
+ expandedPaths.size());
+ }
+
+ long matchCnt = 0;
+ for (String key : expandedPaths) {
+ String fullPath = "s3://" + bucket + "/" + key;
+ try {
+ HeadObjectResponse headResponse = getClient()
+ .headObject(HeadObjectRequest.builder()
+ .bucket(bucket)
+ .key(key)
+ .build());
+
+ matchCnt++;
+ RemoteFile remoteFile = new RemoteFile(
+ fileNameOnly ?
Paths.get(key).getFileName().toString() : fullPath,
+ true, // isFile
+ headResponse.contentLength(),
+ headResponse.contentLength(),
+ headResponse.lastModified() != null
+ ?
headResponse.lastModified().toEpochMilli() : 0
+ );
+ result.add(remoteFile);
+
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("HEAD success for {}: size={}", fullPath,
headResponse.contentLength());
+ }
+ } catch (NoSuchKeyException e) {
+ // File does not exist, skip it (this is expected for some
expanded patterns)
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("File does not exist (skipped): {}",
fullPath);
+ }
+ } catch (S3Exception e) {
+ if (e.statusCode() == HttpStatus.SC_NOT_FOUND) {
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("File does not exist (skipped): {}",
fullPath);
+ }
+ } else {
+ throw e;
+ }
+ }
+ }
+
+ if (LOG.isDebugEnabled()) {
+ long duration = System.nanoTime() - startTime;
+ LOG.debug("Deterministic path HEAD requests: checked {} paths,
found {} files, took {} ms",
+ expandedPaths.size(), matchCnt, duration / 1000 /
1000);
+ }
+
+ return new GlobListResult(Status.OK, "", bucket, "");
+ } catch (Exception e) {
+ LOG.warn("Failed to use HEAD requests, falling back to listing:
{}", e.getMessage());
+ return null;
+ }
Review Comment:
If an exception occurs after some successful HEADs, this method returns null
to fall back to LIST but leaves already-added entries in `result`, which can
lead to duplicates or partial results when listing adds the same files again.
Consider accumulating results in a temporary list and only appending to
`result` if the whole HEAD pass succeeds (or rollback to the original size
before returning null).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]