yihua commented on code in PR #18306:
URL: https://github.com/apache/hudi/pull/18306#discussion_r3036084259
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java:
##########
@@ -220,6 +235,26 @@ private boolean needCompact(CompactionTriggerStrategy
compactionTriggerStrategy)
return compactable;
}
+ /**
+ * Determines whether log compaction should be scheduled based on the number
of delta commits
+ * since the last compaction and the last log compaction, compared against
the
+ * {@code hoodie.log.compaction.blocks.threshold} config.
+ */
+ private boolean needLogCompact(Pair<Integer, String>
latestDeltaCommitInfoSinceCompact) {
+ Option<Pair<Integer, String>> latestDeltaCommitInfoSinceLogCompactOption =
getLatestDeltaCommitInfoSinceLogCompaction();
+ int numDeltaCommitsSinceLatestCompaction =
latestDeltaCommitInfoSinceCompact.getLeft();
+ int numDeltaCommitsSinceLatestLogCompaction =
latestDeltaCommitInfoSinceLogCompactOption.isPresent()
+ ? latestDeltaCommitInfoSinceLogCompactOption.get().getLeft()
+ : 0;
+
+ int numDeltaCommitsSince = Math.min(numDeltaCommitsSinceLatestCompaction,
numDeltaCommitsSinceLatestLogCompaction);
+ boolean shouldLogCompact = numDeltaCommitsSince >=
config.getLogCompactionBlocksThreshold();
+ if (shouldLogCompact) {
+ log.info("There have been {} delta commits since last compaction or log
compaction, triggering log compaction.", numDeltaCommitsSince);
Review Comment:
<a href="#"><img alt="P2"
src="https://greptile-static-assets.s3.amazonaws.com/badges/p2.svg?v=7"
align="top"></a> **Inaccurate log message wording**
`numDeltaCommitsSince` is `Math.min(sinceCompaction, sinceLogCompaction)`,
which represents the count since the **most recent** of the two events. The
current message "since last compaction or log compaction" is ambiguous — it
reads as if either event is used as the reference. Consider:
```suggestion
log.info("There have been {} delta commits since the most recent
compaction or log compaction, triggering log compaction.",
numDeltaCommitsSince);
```
— *Greptile*
([original](https://github.com/yihua/hudi/pull/11#discussion_r3036084156))
(source:comment#3036084156)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/ScheduleCompactionActionExecutor.java:
##########
@@ -220,6 +235,26 @@ private boolean needCompact(CompactionTriggerStrategy
compactionTriggerStrategy)
return compactable;
}
+ /**
+ * Determines whether log compaction should be scheduled based on the number
of delta commits
+ * since the last compaction and the last log compaction, compared against
the
+ * {@code hoodie.log.compaction.blocks.threshold} config.
+ */
+ private boolean needLogCompact(Pair<Integer, String>
latestDeltaCommitInfoSinceCompact) {
+ Option<Pair<Integer, String>> latestDeltaCommitInfoSinceLogCompactOption =
getLatestDeltaCommitInfoSinceLogCompaction();
+ int numDeltaCommitsSinceLatestCompaction =
latestDeltaCommitInfoSinceCompact.getLeft();
+ int numDeltaCommitsSinceLatestLogCompaction =
latestDeltaCommitInfoSinceLogCompactOption.isPresent()
+ ? latestDeltaCommitInfoSinceLogCompactOption.get().getLeft()
+ : 0;
+
+ int numDeltaCommitsSince = Math.min(numDeltaCommitsSinceLatestCompaction,
numDeltaCommitsSinceLatestLogCompaction);
+ boolean shouldLogCompact = numDeltaCommitsSince >=
config.getLogCompactionBlocksThreshold();
Review Comment:
<a href="#"><img alt="P1"
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"
align="top"></a> **Implicit sentinel `0` breaks when
`LogCompactionBlocksThreshold` is 0**
When `latestDeltaCommitInfoSinceLogCompactOption` is empty (which happens
when `getDeltaCommitsSinceLatestCompletedLogCompaction` detects a pending log
compaction), `numDeltaCommitsSinceLatestLogCompaction` is set to `0`. The
subsequent `Math.min(X, 0) = 0` then makes `0 >= threshold` evaluate to
`false`, silently preventing a second log compaction from being scheduled on
top of a pending one.
This works correctly for any positive threshold, but if
`config.getLogCompactionBlocksThreshold()` ever returns `0`, the expression `0
>= 0` evaluates to `true` — causing a new log compaction to be incorrectly
scheduled while one is already in-flight. An explicit early return is clearer
and threshold-agnostic:
```suggestion
Option<Pair<Integer, String>> latestDeltaCommitInfoSinceLogCompactOption
= getLatestDeltaCommitInfoSinceLogCompaction();
if (!latestDeltaCommitInfoSinceLogCompactOption.isPresent()) {
// Pending log compaction in progress — do not schedule another one
return false;
}
int numDeltaCommitsSinceLatestCompaction =
latestDeltaCommitInfoSinceCompact.getLeft();
int numDeltaCommitsSinceLatestLogCompaction =
latestDeltaCommitInfoSinceLogCompactOption.get().getLeft();
```
— *Greptile*
([original](https://github.com/yihua/hudi/pull/11#discussion_r3036084148))
(source:comment#3036084148)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]