yihua commented on code in PR #18417:
URL: https://github.com/apache/hudi/pull/18417#discussion_r3035649721
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -385,7 +394,13 @@ protected List<PartitionPath>
listPartitionPaths(List<String> relativePartitionP
HoodieTimeline timelineToQuery = findInstantsInRange();
matchedPartitionPaths =
TimelineUtils.getWrittenPartitions(timelineToQuery);
} else {
Review Comment:
🤖 The asymmetry is reasonable IMO — `TimelineUtils.getWrittenPartitions`
reads from the already-loaded in-memory timeline, so it's typically O(commits)
with no filesystem I/O, while `getPartitionPathWithPathPrefixes` does actual
path listing against the filesystem or metadata table. That said, for very
large timelines (many commits, long retention) it could be non-trivial, so
adding timing there for completeness wouldn't hurt — but I wouldn't call it a
blocker.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
.orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
.collect(Collectors.toList())
));
+ } finally {
+ log.debug("On {} with query instant as {}, it took {}ms to filter {}
files into file slices across {} partitions",
Review Comment:
🤖 The `finally` here has no corresponding `catch`, so this log fires with
the same wording whether the filter succeeded or threw an exception. During
debugging, seeing "it took Xms to filter Y files into file slices" right before
a stack trace could be confusing. Would it be worth omitting the log (or using
a different message) on the exception path, similar to how you guard the
cache-miss log?
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
Review Comment:
🤖 Same pattern here — on the `IOException` path the finally logs "it took
Xms to fetch files for N uncached partitions via getAllFilesInPartitions",
which implies the fetch completed. Since the exception message is also logged
by the caller, the timing line could mislead someone into thinking the fetch
succeeded. Could this message be written in a way that's unambiguous on both
success and failure paths, e.g. "...time spent in getAllFilesInPartitions
attempt..."?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]