github-actions[bot] commented on code in PR #63648:
URL: https://github.com/apache/doris/pull/63648#discussion_r3300800073
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -570,6 +570,17 @@ public TableScan createTableScan() throws UserException {
}
private CloseableIterable<FileScanTask> planFileScanTask(TableScan scan) {
+ long startTime = System.currentTimeMillis();
+ try {
+ return doPlanFileScanTask(scan);
+ } finally {
+ if (getSummaryProfile() != null) {
+
getSummaryProfile().addExternalTableGetFileScanTasksTime(System.currentTimeMillis()
- startTime);
Review Comment:
This records only the time to create the `CloseableIterable`, not
necessarily the time to plan/iterate the file scan tasks. In the
manifest-cache-disabled path, batch mode returns
`TableScanUtil.splitFiles(scan.planFiles(), ...)`, and the actual Iceberg
`planFiles()` iteration happens later in `doStartSplit()`/`doGetSplits()` after
this `finally` has already added the metric. For batch Iceberg scans without
manifest cache, the new `External Table Get File Scan Tasks Time` can therefore
be near zero while the expensive metadata planning happens outside the timer.
Please measure around the iteration or return a timed iterable so the recorded
value covers the actual scan-task planning work.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hudi/source/HudiScanNode.java:
##########
@@ -447,9 +461,13 @@ public List<Split> getSplits(int numBackends) throws
UserException {
return getIncrementalSplits();
}
if (!partitionInit) {
+ long startTime = System.currentTimeMillis();
try {
prunedPartitions =
hmsTable.getCatalog().getExecutionAuthenticator().execute(()
-> getPrunedPartitions(hudiClient));
Review Comment:
This timer is skipped in the normal planning flow.
`FileQueryScanNode.createScanRangeLocations()` calls `isBatchMode()` before
`getSplits()`, and Hudi's `isBatchMode()` initializes `prunedPartitions` and
sets `partitionInit = true` for full-table scans. By the time execution reaches
this `if (!partitionInit)` block, it is already false, so `External Table Get
Partitions Time` and the aggregate external meta time do not include Hudi
partition pruning for ordinary Hudi scans. Please time the partition
initialization where `isBatchMode()` does it, or refactor the partition
initialization into one helper that records the profile exactly once.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]