github-actions[bot] commented on code in PR #63648:
URL: https://github.com/apache/doris/pull/63648#discussion_r3300800073


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -570,6 +570,17 @@ public TableScan createTableScan() throws UserException {
     }
 
     private CloseableIterable<FileScanTask> planFileScanTask(TableScan scan) {
+        long startTime = System.currentTimeMillis();
+        try {
+            return doPlanFileScanTask(scan);
+        } finally {
+            if (getSummaryProfile() != null) {
+                
getSummaryProfile().addExternalTableGetFileScanTasksTime(System.currentTimeMillis()
 - startTime);

Review Comment:
   This records only the time to create the `CloseableIterable`, not 
necessarily the time to plan/iterate the file scan tasks. In the 
manifest-cache-disabled path, batch mode returns 
`TableScanUtil.splitFiles(scan.planFiles(), ...)`, and the actual Iceberg 
`planFiles()` iteration happens later in `doStartSplit()`/`doGetSplits()` after 
this `finally` has already added the metric. For batch Iceberg scans without 
manifest cache, the new `External Table Get File Scan Tasks Time` can therefore 
be near zero while the expensive metadata planning happens outside the timer. 
Please measure around the iteration or return a timed iterable so the recorded 
value covers the actual scan-task planning work.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hudi/source/HudiScanNode.java:
##########
@@ -447,9 +461,13 @@ public List<Split> getSplits(int numBackends) throws 
UserException {
             return getIncrementalSplits();
         }
         if (!partitionInit) {
+            long startTime = System.currentTimeMillis();
             try {
                 prunedPartitions = 
hmsTable.getCatalog().getExecutionAuthenticator().execute(()
                         -> getPrunedPartitions(hudiClient));

Review Comment:
   This timer is skipped in the normal planning flow. 
`FileQueryScanNode.createScanRangeLocations()` calls `isBatchMode()` before 
`getSplits()`, and Hudi's `isBatchMode()` initializes `prunedPartitions` and 
sets `partitionInit = true` for full-table scans. By the time execution reaches 
this `if (!partitionInit)` block, it is already false, so `External Table Get 
Partitions Time` and the aggregate external meta time do not include Hudi 
partition pruning for ordinary Hudi scans. Please time the partition 
initialization where `isBatchMode()` does it, or refactor the partition 
initialization into one helper that records the profile exactly once.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to