Re: [PR] feat(common): add log reader scan metrics and logging for log block processing [hudi]

via GitHub Tue, 07 Apr 2026 11:40:07 -0700


yihua commented on code in PR #18412:
URL: https://github.com/apache/hudi/pull/18412#discussion_r3047139363



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##########
@@ -373,11 +409,17 @@ && 
compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), GREATER_THA
           validBlockInstants.add(compactedFinalInstantTime);
         }
       }
+      Collections.reverse(validBlockInstants);
       LOG.debug("Number of applied rollback blocks {}", numBlocksRolledBack);
-
+      LOG.info("Total valid instants found are {}. Instants are {}", 
validBlockInstants.size(), validBlockInstants);
+      if (ignoredBlockCount > 0) {
+        LOG.info("Ignored {} log blocks from {} instants not in the range: 
{}", ignoredBlockCount, ignoredInstants.size(), ignoredInstants);

Review Comment:
   🤖 This INFO log fires once per file slice whenever any blocks are 
range-filtered. During incremental reads over a narrow window on a large table, 
you'd get one log line per file slice, each printing the full `ignoredInstants` 
set. If that set grows large (many distinct instants being filtered), the 
output could be noisy. Would it be worth capping the printed set or logging it 
at DEBUG instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(common): add log reader scan metrics and logging for log block processing [hudi]

Reply via email to