yihua commented on code in PR #18337:
URL: https://github.com/apache/hudi/pull/18337#discussion_r3066980244
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -94,6 +100,22 @@ private boolean needsCleaning(CleaningTriggerStrategy
strategy) {
}
}
+ private HoodieCleanerPlan getEmptyCleanerPlan(Option<HoodieInstant>
earliestInstant, CleanPlanner<T, I, K, O> planner) throws IOException {
+ HoodieCleanerPlan.Builder cleanBuilder = HoodieCleanerPlan.newBuilder()
+ .setFilePathsToBeDeletedPerPartition(Collections.emptyMap())
+
.setExtraMetadata(prepareExtraMetadata(planner.getSavepointedTimestamps()));
+ if (earliestInstant.isPresent()) {
+ HoodieInstant hoodieInstant = earliestInstant.get();
+ cleanBuilder.setPolicy(config.getCleanerPolicy().name())
+ .setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION)
+ .setEarliestInstantToRetain(new
HoodieActionInstant(hoodieInstant.requestedTime(), hoodieInstant.getAction(),
hoodieInstant.getState().name()))
+
.setLastCompletedCommitTimestamp(planner.getLastCompletedCommitTimestamp());
+ } else {
+ cleanBuilder.setPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name());
+ }
+ return cleanBuilder.build();
Review Comment:
<a href="#"><img alt="P1"
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"
align="top"></a> **`setVersion` is missing in the `else` branch of
`getEmptyCleanerPlan`**
When `earliestInstant` is absent, the builder never calls
`.setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION)`, so the serialised plan
will have the Avro-schema default (typically `1` or `0` depending on the schema
evolution). When this plan is later read back by `CleanerUtils.getCleanerPlan`,
a version mismatch can cause silent compatibility issues or an
`UnsupportedOperationException` in version-dispatch logic.
`setVersion` should be applied unconditionally:
```suggestion
} else {
cleanBuilder.setPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
.setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION);
}
```
— *Greptile*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066975898))
(source:comment#3066975898)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -244,6 +244,17 @@ public class HoodieCleanConfig extends HoodieConfig {
+ "This can be useful for very large tables to avoid OOM issues
during cleaning. "
+ "If both this config and " + CLEAN_PARTITION_FILTER_REGEX_KEY + "
are set, the selected partitions take precedence.");
+ public static final ConfigProperty<Long>
MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS = ConfigProperty
+ .key("hoodie.write.empty.clean.create.duration.ms")
+ .defaultValue(-1L)
+ .markAdvanced()
+ .withDocumentation("In some cases empty clean commit needs to be created
to ensure the clean planner "
+ + "does not look through entire dataset if there are no clean plans.
This is possible for append-only "
+ + "dataset. Also, for these datasets we cannot ignore clean
completely since in the future there could "
+ + "be upsert or replace operations. By creating empty clean commit,
earliest_commit_to_retain value "
+ + "will be updated so that now clean planner can only check for
partitions that are modified after the "
+ + "last empty clean's earliest_commit_toRetain value there by
optimizing the clean planning");
+
Review Comment:
_⚠️ Potential issue_ | _🟡 Minor_
<details>
<summary>🧩 Analysis chain</summary>
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "== Usages of empty-clean duration config =="
rg -n -C3 --type=java
'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS|maxDurationToCreateEmptyCleanMs|withMaxDurationToCreateEmptyClean'
echo
echo "== Inspect CleanPlanActionExecutor logic =="
fd -i "CleanPlanActionExecutor.java" --exec sed -n '1,260p' {}
echo
echo "== Inspect validations in clean/write config =="
fd -i "HoodieCleanConfig.java" --exec rg -n -C3
'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS|withMaxDurationToCreateEmptyClean|build\('
{}
fd -i "HoodieWriteConfig.java" --exec rg -n -C3
'maxDurationToCreateEmptyCleanMs|validate\(' {}
```
Repository: yihua/hudi
Length of output: 37609
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Check for all maxDurationToCreateEmptyCleanMs accesses ==="
rg -n 'maxDurationToCreateEmptyCleanMs' --type=java
echo
echo "=== Verify no other executors or code paths use this config ==="
rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS' --type=java
echo
echo "=== Check if this config can be loaded from properties directly ==="
rg -n 'fromProperties\|fromFile'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
| head -20
```
Repository: yihua/hudi
Length of output: 1473
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Check HoodieWriteConfig initialization and getLong method ==="
rg -n 'public long getLong|private long getLong'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
-A 5 | head -30
echo
echo "=== Check if HoodieCleanConfig has any factory methods ==="
fd -i "HoodieCleanConfig.java" --exec grep -n 'public static\|public
HoodieCleanConfig' {} | head -20
echo
echo "=== Verify the build() method in HoodieCleanConfig ==="
fd -i "HoodieCleanConfig.java" --exec sed -n '433,445p' {}
```
Repository: yihua/hudi
Length of output: 119
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Find the getLong implementation in config hierarchy ==="
fd -i "HoodieConfig.java" --exec grep -n 'public long getLong' {}
echo
echo "=== Check HoodieWriteConfig build method and validation ==="
fd -i "HoodieWriteConfig.java" --exec sed -n '3789,3800p' {}
echo
echo "=== Full HoodieCleanConfig build method ==="
fd -i "HoodieCleanConfig.java" --exec sed -n '433,450p' {}
echo
echo "=== Check if value is parsed/validated anywhere in build or getter ==="
rg -n 'parseLong|Long.parseLong'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/
--type=java | grep -i 'empty\|clean' | head -10
```
Repository: yihua/hudi
Length of output: 113
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Search for AbstractHoodieConfig or base config class ==="
fd 'AbstractHoodieConfig.java|HoodieConfig.java' --type f
echo
echo "=== Get full HoodieCleanConfig build method ==="
cat
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
| sed -n '433,445p'
echo
echo "=== Check all references to getLong in config directory ==="
find hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config
-name '*.java' -exec grep -l 'getLong' {} \;
echo
echo "=== Search for where config value is actually retrieved (check if
Long.parseLong is used) ==="
rg 'Long.parseLong.*MAX_DURATION|getLong.*MAX_DURATION'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/ --type=java
```
Repository: yihua/hudi
Length of output: 1138
---
🏁 Script executed:
```shell
#!/bin/bash
set -euoripefail
echo "=== Check HoodieConfig.getLong implementation ==="
sed -n '1,500p'
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java |
grep -A 10 'public.*getLong'
echo
echo "=== Verify how getLong handles Long.parseLong ==="
rg -n 'Long.parseLong'
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java -A 3
echo
echo "=== Check if validation happens on config build in HoodieWriteConfig
==="
sed -n '3789,3810p'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
echo
echo "=== Verify if properties file can bypass builder validation ==="
rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
-B 5 -A 5 | head -30
```
Repository: yihua/hudi
Length of output: 728
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Check HoodieConfig.getLong implementation ==="
sed -n '1,500p'
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java |
grep -A 10 'public.*getLong'
echo
echo "=== Verify how getLong handles Long.parseLong ==="
rg -n 'Long.parseLong'
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java -A 3
echo
echo "=== Check if validation happens on config build in HoodieWriteConfig
==="
sed -n '3789,3810p'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
echo
echo "=== Verify if properties file can bypass builder validation ==="
rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
-B 5 -A 5 | head -30
```
Repository: yihua/hudi
Length of output: 3584
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "=== Check HoodieWriteConfig.validate() method to see if it validates
MAX_DURATION ==="
rg -n 'private void validate'
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
-A 100 | head -150
```
Repository: yihua/hudi
Length of output: 5864
---
</details>
**Consider adding validation for `MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS` to
reject invalid ranges.**
Line 429 accepts any `long` value without validation. While downstream code
at line 251 safely gates the feature with
`config.maxDurationToCreateEmptyCleanMs() > 0`, adding explicit validation
would improve clarity and prevent silent misconfigurations. Enforce `duration
== -1 || duration >= 0` either in the builder setter (line 428) or the
`build()` method (line 433) to match the documented semantics.
The proposed fix remains valid:
<details>
<summary>🔧 Suggested fix</summary>
```diff
public HoodieCleanConfig.Builder withMaxDurationToCreateEmptyClean(long
duration) {
+ if (duration < -1) {
+ throw new IllegalArgumentException(
+ "hoodie.write.empty.clean.create.duration.ms must be -1
(disabled) or >= 0");
+ }
cleanConfig.setValue(MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS,
String.valueOf(duration));
return this;
}
```
</details>
Also applies to: 433-439 (add similar validation to `build()` method to
catch property-file configurations)
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java`
around lines 247 - 257, Add explicit validation for
MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS by enforcing the value is either -1 or
>=
0; in the builder setter method maxDurationToCreateEmptyCleanMs(...) validate
the incoming long and throw an IllegalArgumentException for invalid values,
and
add the same check in the build() method to catch property-file or
deserialized
configurations before constructing the HoodieCleanConfig instance.
```
</details>
<!--
fingerprinting:phantom:poseidon:hawk:34ef15c8-abad-4f45-af37-5f628ba46abc -->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979372))
(source:comment#3066979372)
##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieCleanerTestBase.java:
##########
@@ -100,18 +100,23 @@ protected List<HoodieCleanStat> runCleaner(
return runCleaner(config, simulateRetryFailure, simulateMetadataFailure,
1, false);
}
+ protected List<HoodieCleanStat> runCleaner(
+ HoodieWriteConfig config, boolean simulateRetryFailure, boolean
simulateMetadataFailure,
+ Integer firstCommitSequence, boolean needInstantInHudiFormat) throws
IOException {
+ SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
+ String cleanInstantTs = needInstantInHudiFormat ?
makeNewCommitTime(firstCommitSequence, "%014d") :
makeNewCommitTime(firstCommitSequence, "%09d");
+ return runCleaner(config, simulateRetryFailure, simulateMetadataFailure,
writeClient, cleanInstantTs);
+ }
Review Comment:
_⚠️ Potential issue_ | _🟡 Minor_
**Close the write client in this overload.**
Line 106 allocates a `SparkRDDWriteClient` and returns without closing it.
Repeated helper calls will leak client resources and can make the cleaner tests
flaky.
<details>
<summary>♻️ Minimal fix</summary>
```diff
protected List<HoodieCleanStat> runCleaner(
HoodieWriteConfig config, boolean simulateRetryFailure, boolean
simulateMetadataFailure,
Integer firstCommitSequence, boolean needInstantInHudiFormat) throws
IOException {
- SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
- String cleanInstantTs = needInstantInHudiFormat ?
makeNewCommitTime(firstCommitSequence, "%014d") :
makeNewCommitTime(firstCommitSequence, "%09d");
- return runCleaner(config, simulateRetryFailure,
simulateMetadataFailure, writeClient, cleanInstantTs);
+ String cleanInstantTs = needInstantInHudiFormat
+ ? makeNewCommitTime(firstCommitSequence, "%014d")
+ : makeNewCommitTime(firstCommitSequence, "%09d");
+ try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config))
{
+ return runCleaner(config, simulateRetryFailure,
simulateMetadataFailure, writeClient, cleanInstantTs);
+ }
}
```
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieCleanerTestBase.java`
around lines 103 - 109, The overload of runCleaner creates a
SparkRDDWriteClient
via getHoodieWriteClient(config) and returns without closing it; update this
method (the runCleaner overload that builds cleanInstantTs) to ensure the
allocated writeClient is closed in all cases by calling writeClient.close()
(or
try-with-resources/try-finally) after delegating to the other
runCleaner(String
cleanInstantTs, SparkRDDWriteClient<?> writeClient, ...) so the client is
always
closed even on exceptions; reference the methods/variables: runCleaner(...),
getHoodieWriteClient, writeClient, makeNewCommitTime, and ensure the close
happens after runCleaner returns or in finally.
```
</details>
<!--
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295
-->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979380))
(source:comment#3066979380)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java:
##########
@@ -238,6 +241,20 @@ private HoodieCleanMetadata runClean(HoodieTable<T, I, K,
O> table, HoodieInstan
}
}
+ private static HoodieCleanMetadata
createEmptyCleanMetadata(HoodieCleanerPlan cleanerPlan, HoodieInstant
inflightInstant, long timeTakenMillis) {
+ return HoodieCleanMetadata.newBuilder()
+ .setStartCleanTime(inflightInstant.requestedTime())
+ .setTimeTakenInMillis(timeTakenMillis)
+ .setTotalFilesDeleted(0)
+
.setLastCompletedCommitTimestamp(cleanerPlan.getLastCompletedCommitTimestamp())
+
.setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain().getTimestamp())
Review Comment:
<a href="#"><img alt="P1"
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"
align="top"></a> **Potential NullPointerException on
`getEarliestInstantToRetain()`**
`createEmptyCleanMetadata` is reached whenever `cleanStats.isEmpty()`.
`cleanStats` is built by streaming over
`cleanerPlan.getFilePathsToBeDeletedPerPartition().keySet()`, so it is empty
when the partition-file map is empty — which is exactly the case for any plan
that has *only* `partitionsToBeDeleted` entries with no file paths.
In the regular (non-empty-clean-commit) code path inside
`requestClean(context)`:
```java
return new HoodieCleanerPlan(
earliestInstant.map(...).orElse(null), // ← null when no earliest
instant
...
);
```
If `earliestInstant` is absent (e.g. `KEEP_LATEST_FILE_VERSIONS` policy with
no commits beyond the threshold) the plan's `EarliestInstantToRetain` field is
null. If `partitionsToBeDeleted` is non-empty for that plan and there are no
file paths, `cleanStats` will be empty, `createEmptyCleanMetadata` will be
called, and line 250 will NPE:
```java
.setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain().getTimestamp())
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can be null
```
```suggestion
.setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain()
!= null
? cleanerPlan.getEarliestInstantToRetain().getTimestamp() : null)
```
— *Greptile*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066975828))
(source:comment#3066975828)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java:
##########
@@ -213,17 +215,18 @@ private HoodieCleanMetadata runClean(HoodieTable<T, I, K,
O> table, HoodieInstan
}
List<HoodieCleanStat> cleanStats = clean(context, cleanerPlan);
+ table.getMetaClient().reloadActiveTimeline();
+ HoodieCleanMetadata metadata;
if (cleanStats.isEmpty()) {
- return HoodieCleanMetadata.newBuilder().build();
+ metadata = createEmptyCleanMetadata(cleanerPlan, inflightInstant,
timer.endTimer());
+ } else {
+ metadata = CleanerUtils.convertCleanMetadata(
+ inflightInstant.requestedTime(),
+ Option.of(timer.endTimer()),
+ cleanStats,
+ cleanerPlan.getExtraMetadata()
+ );
Review Comment:
_⚠️ Potential issue_ | _🟠 Major_
**Don't use `cleanStats.isEmpty()` as the empty-clean signal.**
Line 220 now treats every stat-less execution as a no-op clean, but
`clean()` still deletes `partitionsToBeDeleted` without emitting a
`HoodieCleanStat` for them. After `CleanPlanActionExecutor` started treating
partition deletes as real work, a partition-only clean will land here, get
serialized as an empty clean, and lose its partition-delete metadata.
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java`
around lines 218 - 228, The code in CleanActionExecutor should not treat
cleanStats.isEmpty() alone as the signal for an empty clean because partition
deletions (tracked on the CleanerPlan) are real work; change the conditional
so
that createEmptyCleanMetadata(...) is used only when both
cleanStats.isEmpty()
AND there are no partition deletes recorded on the cleanerPlan (e.g., check
cleanerPlan.getPartitionsToDelete() or equivalent), otherwise call
CleanerUtils.convertCleanMetadata(...) as before (passing
inflightInstant.requestedTime() and Option.of(timer.endTimer())) so
partition-delete metadata is preserved; update the logic around cleanStats,
cleanerPlan, createEmptyCleanMetadata, and CleanerUtils.convertCleanMetadata
accordingly.
```
</details>
<!--
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295
-->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979375))
(source:comment#3066979375)
##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestCleanPlanExecutor.java:
##########
@@ -813,4 +821,202 @@ public void testKeepXHoursWithCleaning(
testTable.close();
}
}
+
+ @ParameterizedTest
+ @ValueSource(booleans = {true, false})
+ void testEmptyCleansAddedAfterThreshold(boolean secondCommitAfterThreshold)
throws Exception {
+ boolean enableIncrementalClean = true;
+ boolean enableBootstrapSourceClean = false;
+ HoodieWriteConfig config =
HoodieWriteConfig.newBuilder().withPath(basePath)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build())
+ .withCleanConfig(HoodieCleanConfig.newBuilder()
+ .withIncrementalCleaningMode(enableIncrementalClean)
+
.withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.EAGER)
+ .withCleanBootstrapBaseFileEnabled(enableBootstrapSourceClean)
+
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS).cleanerNumHoursRetained(2)
+ .withMaxDurationToCreateEmptyClean(TimeUnit.MINUTES.toMillis(60))
+ .build())
+ .build();
+
+ HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+ try {
+ String p0 = "2020/01/01";
+
+ String file1P0C0 = UUID.randomUUID().toString();
+ Instant instant = Instant.now();
+ ZonedDateTime commitDateTime = ZonedDateTime.ofInstant(instant,
metaClient.getTableConfig().getTimelineTimezone().getZoneId());
+ int minutesForFirstCommit = 180;
+ String firstCommitTs =
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForFirstCommit).toInstant()));
+
+ commitToTestTable(testTable, firstCommitTs, p0, file1P0C0);
+ testTable = tearDownTestTableAndReinit(testTable, config);
+
+ // make next commit, with 1 insert & 1 update per partition
+ String file2P0C1 = UUID.randomUUID().toString();
+ int minutesForSecondCommit = 150;
+ String secondCommitTs =
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForSecondCommit).toInstant()));
+ testTable = tearDownTestTableAndReinit(testTable, config);
+
+ commitToTestTable(testTable, secondCommitTs, p0, file2P0C1);
+ testTable = tearDownTestTableAndReinit(testTable, config);
+ metaClient = HoodieTableMetaClient.reload(metaClient);
+
+ // make next commit, with 1 insert per partition
+ int minutesForThirdCommit = 90;
+ String thirdCommitTs =
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForThirdCommit).toInstant()));
+ String file3P0C2 = UUID.randomUUID().toString();
+
+ testTable = tearDownTestTableAndReinit(testTable, config);
+
+ commitToTestTable(testTable, thirdCommitTs, p0, file3P0C2);
+ testTable = tearDownTestTableAndReinit(testTable, config);
+ metaClient = HoodieTableMetaClient.reload(metaClient);
+
+ // first empty clean can be generated since earliest instant to retain
will be the first commit (always keep last two instants at a minimum)
+ String firstCleanInstant =
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minus(secondCommitAfterThreshold
? 70 : 30, ChronoUnit.MINUTES).toInstant()));
+
+ SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
+ List<HoodieCleanStat> hoodieCleanStatsThree = runCleaner(config, false,
false, writeClient, firstCleanInstant);
+ assertEquals(0, hoodieCleanStatsThree.size(), "Must not scan any
partitions and clean any files");
+ assertEquals(1,
metaClient.reloadActiveTimeline().getCleanerTimeline().filterCompletedInstants().countInstants());
+ String actualFirst =
metaClient.getActiveTimeline().getCleanerTimeline().lastInstant().get().requestedTime();
+ writeClient.close();
+
+ String file4P0C1 = UUID.randomUUID().toString();
+ int minutesForFourthCommit = 10;
+ String fourthCommitTs =
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForFourthCommit).toInstant()));
+ testTable = tearDownTestTableAndReinit(testTable, config);
+
+ commitToTestTable(testTable, fourthCommitTs, p0, file4P0C1);
+ testTable = tearDownTestTableAndReinit(testTable, config);
+
+ // add a savepoint
+ getHoodieWriteClient(config).savepoint(fourthCommitTs, "user",
"comment");
Review Comment:
_⚠️ Potential issue_ | _🟡 Minor_
**Close the savepoint client.**
Line 894 creates a write client just to issue a savepoint and never closes
it. This test already churns through multiple clients, so leaking one here is
enough to leave background resources behind.
<details>
<summary>♻️ Minimal fix</summary>
```diff
- getHoodieWriteClient(config).savepoint(fourthCommitTs, "user",
"comment");
+ try (SparkRDDWriteClient<?> savepointClient =
getHoodieWriteClient(config)) {
+ savepointClient.savepoint(fourthCommitTs, "user", "comment");
+ }
```
</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately
replaces the highlighted code, contains no missing lines, and has no issues
with indentation. Thoroughly test & benchmark the code to ensure it meets the
requirements.
```suggestion
// add a savepoint
try (SparkRDDWriteClient<?> savepointClient =
getHoodieWriteClient(config)) {
savepointClient.savepoint(fourthCommitTs, "user", "comment");
}
```
</details>
<!-- suggestion_end -->
<details>
<summary>🤖 Prompt for AI Agents</summary>
```
Verify each finding against the current code and only fix it if needed.
In
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestCleanPlanExecutor.java`
around lines 893 - 894, The test creates a write client just to call
savepoint
via getHoodieWriteClient(config).savepoint(...) but never closes that client;
change the code to obtain the HoodieWriteClient instance (e.g.,
HoodieWriteClient client = getHoodieWriteClient(config)), call
client.savepoint(fourthCommitTs, "user", "comment"), and then close the
client
(client.close()) or use a try-with-resources block to ensure the write
client is
always closed to release background resources.
```
</details>
<!--
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295
-->
<!-- This is an auto-generated comment by CodeRabbit -->
— *CodeRabbit*
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979376))
(source:comment#3066979376)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]