suryaprasanna commented on code in PR #18417:
URL: https://github.com/apache/hudi/pull/18417#discussion_r3054693176
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -476,6 +491,12 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
Set<StoragePath> missingPartitionPaths =
CollectionUtils.diffSet(partitionPaths, cachedPartitionPaths.keySet());
+ if (missingPartitionPaths.isEmpty()) {
+ return cachedPartitionPaths.values().stream()
+ .flatMap(Collection::stream)
+ .collect(Collectors.toList());
+ }
+
Review Comment:
Makes sense. I will update the PR description to call out the early-return
optimization instead of saying there is no functional behavior change.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -501,7 +527,10 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
return result;
Review Comment:
Updated this one as well to use neutral attempt-based wording in the finally
block, so the timing line is unambiguous even when the fetch path throws.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
.orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
.collect(Collectors.toList())
));
+ } finally {
+ log.debug("On {} with query instant as {}, it took {}ms to filter {}
files into file slices across {} partitions",
+ metaClient.getTableConfig().getTableName(),
queryInstant.orElse("N/A"), timer.endTimer(),
+ allFiles.size(), partitions.size());
}
}
protected List<PartitionPath> listPartitionPaths(List<String>
relativePartitionPaths,
Types.RecordType
partitionFields,
Expression
partitionColumnPredicates) {
List<String> matchedPartitionPaths;
+ HoodieTimer timer = HoodieTimer.start();
try {
matchedPartitionPaths =
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
partitionFields, partitionColumnPredicates);
} catch (IOException e) {
throw new HoodieIOException("Error fetching partition paths", e);
Review Comment:
Updated this to include the table name for consistency with the exception
message in listPartitionPathFiles. I made the same enrichment in both
listPartitionPaths overloads.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
.orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
.collect(Collectors.toList())
));
+ } finally {
+ log.debug("On {} with query instant as {}, it took {}ms to filter {}
files into file slices across {} partitions",
+ metaClient.getTableConfig().getTableName(),
queryInstant.orElse("N/A"), timer.endTimer(),
+ allFiles.size(), partitions.size());
}
}
protected List<PartitionPath> listPartitionPaths(List<String>
relativePartitionPaths,
Types.RecordType
partitionFields,
Expression
partitionColumnPredicates) {
List<String> matchedPartitionPaths;
+ HoodieTimer timer = HoodieTimer.start();
try {
matchedPartitionPaths =
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
partitionFields, partitionColumnPredicates);
} catch (IOException e) {
throw new HoodieIOException("Error fetching partition paths", e);
+ } finally {
+ log.debug("On {}, it took {} ms to list partition paths with {}
relativePartitionPaths and partition predicates",
Review Comment:
Updated this to include the table name for consistency with the exception
message in listPartitionPathFiles. I made the same enrichment in both
listPartitionPaths overloads.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
Review Comment:
Rephrased the DEBUG message to neutral attempt-based wording so it reads
correctly on both the success and exception paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]