yihua commented on code in PR #18417:
URL: https://github.com/apache/hudi/pull/18417#discussion_r3035649721


##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -385,7 +394,13 @@ protected List<PartitionPath> 
listPartitionPaths(List<String> relativePartitionP
           HoodieTimeline timelineToQuery = findInstantsInRange();
           matchedPartitionPaths = 
TimelineUtils.getWrittenPartitions(timelineToQuery);
         } else {

Review Comment:
   🤖 The asymmetry is reasonable IMO — `TimelineUtils.getWrittenPartitions` 
reads from the already-loaded in-memory timeline, so it's typically O(commits) 
with no filesystem I/O, while `getPartitionPathWithPathPrefixes` does actual 
path listing against the filesystem or metadata table. That said, for very 
large timelines (many commits, long retention) it could be non-trivial, so 
adding timing there for completeness wouldn't hurt — but I wouldn't call it a 
blocker.



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>> 
filterFiles(List<PartitionPath> part
                       .orElseGet(() -> 
finalFileSystemView.getLatestFileSlices(partitionPath.path))
                       .collect(Collectors.toList())
           ));
+    } finally {
+      log.debug("On {} with query instant as {}, it took {}ms to filter {} 
files into file slices across {} partitions",

Review Comment:
   🤖 The `finally` here has no corresponding `catch`, so this log fires with 
the same wording whether the filter succeeded or threw an exception. During 
debugging, seeing "it took Xms to filter Y files into file slices" right before 
a stack trace could be confusing. Would it be worth omitting the log (or using 
a different message) on the exception path, similar to how you guard the 
cache-miss log?



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo> 
listPartitionPathFiles(List<PartitionPath> partiti
 

Review Comment:
   🤖 Same pattern here — on the `IOException` path the finally logs "it took 
Xms to fetch files for N uncached partitions via getAllFilesInPartitions", 
which implies the fetch completed. Since the exception message is also logged 
by the caller, the timing line could mislead someone into thinking the fetch 
succeeded. Could this message be written in a way that's unambiguous on both 
success and failure paths, e.g. "...time spent in getAllFilesInPartitions 
attempt..."?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to