yihua commented on code in PR #18337:
URL: https://github.com/apache/hudi/pull/18337#discussion_r3066980244


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -94,6 +100,22 @@ private boolean needsCleaning(CleaningTriggerStrategy 
strategy) {
     }
   }
 
+  private HoodieCleanerPlan getEmptyCleanerPlan(Option<HoodieInstant> 
earliestInstant, CleanPlanner<T, I, K, O> planner) throws IOException {
+    HoodieCleanerPlan.Builder cleanBuilder = HoodieCleanerPlan.newBuilder()
+        .setFilePathsToBeDeletedPerPartition(Collections.emptyMap())
+        
.setExtraMetadata(prepareExtraMetadata(planner.getSavepointedTimestamps()));
+    if (earliestInstant.isPresent()) {
+      HoodieInstant hoodieInstant = earliestInstant.get();
+      cleanBuilder.setPolicy(config.getCleanerPolicy().name())
+          .setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION)
+          .setEarliestInstantToRetain(new 
HoodieActionInstant(hoodieInstant.requestedTime(), hoodieInstant.getAction(), 
hoodieInstant.getState().name()))
+          
.setLastCompletedCommitTimestamp(planner.getLastCompletedCommitTimestamp());
+    } else {
+      cleanBuilder.setPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name());
+    }
+    return cleanBuilder.build();

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **`setVersion` is missing in the `else` branch of 
`getEmptyCleanerPlan`**
   
   When `earliestInstant` is absent, the builder never calls 
`.setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION)`, so the serialised plan 
will have the Avro-schema default (typically `1` or `0` depending on the schema 
evolution). When this plan is later read back by `CleanerUtils.getCleanerPlan`, 
a version mismatch can cause silent compatibility issues or an 
`UnsupportedOperationException` in version-dispatch logic.
   
   `setVersion` should be applied unconditionally:
   
   ```suggestion
       } else {
         cleanBuilder.setPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
             .setVersion(CleanPlanner.LATEST_CLEAN_PLAN_VERSION);
       }
   ```
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066975898)) 
(source:comment#3066975898)



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -244,6 +244,17 @@ public class HoodieCleanConfig extends HoodieConfig {
           + "This can be useful for very large tables to avoid OOM issues 
during cleaning. "
           + "If both this config and " + CLEAN_PARTITION_FILTER_REGEX_KEY + " 
are set, the selected partitions take precedence.");
 
+  public static final ConfigProperty<Long> 
MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS = ConfigProperty
+      .key("hoodie.write.empty.clean.create.duration.ms")
+      .defaultValue(-1L)
+      .markAdvanced()
+      .withDocumentation("In some cases empty clean commit needs to be created 
to ensure the clean planner "
+          + "does not look through entire dataset if there are no clean plans. 
This is possible for append-only "
+          + "dataset. Also, for these datasets we cannot ignore clean 
completely since in the future there could "
+          + "be upsert or replace operations. By creating empty clean commit, 
earliest_commit_to_retain value "
+          + "will be updated so that now clean planner can only check for 
partitions that are modified after the "
+          + "last empty clean's earliest_commit_toRetain value there by 
optimizing the clean planning");
+

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "== Usages of empty-clean duration config =="
   rg -n -C3 --type=java 
'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS|maxDurationToCreateEmptyCleanMs|withMaxDurationToCreateEmptyClean'
   
   echo
   echo "== Inspect CleanPlanActionExecutor logic =="
   fd -i "CleanPlanActionExecutor.java" --exec sed -n '1,260p' {}
   
   echo
   echo "== Inspect validations in clean/write config =="
   fd -i "HoodieCleanConfig.java" --exec rg -n -C3 
'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS|withMaxDurationToCreateEmptyClean|build\('
 {}
   fd -i "HoodieWriteConfig.java" --exec rg -n -C3 
'maxDurationToCreateEmptyCleanMs|validate\(' {}
   ```
   
   Repository: yihua/hudi
   
   Length of output: 37609
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Check for all maxDurationToCreateEmptyCleanMs accesses ==="
   rg -n 'maxDurationToCreateEmptyCleanMs' --type=java
   
   echo
   echo "=== Verify no other executors or code paths use this config ==="
   rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS' --type=java
   
   echo
   echo "=== Check if this config can be loaded from properties directly ==="
   rg -n 'fromProperties\|fromFile' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
 | head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1473
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Check HoodieWriteConfig initialization and getLong method ==="
   rg -n 'public long getLong|private long getLong' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 -A 5 | head -30
   
   echo
   echo "=== Check if HoodieCleanConfig has any factory methods ==="
   fd -i "HoodieCleanConfig.java" --exec grep -n 'public static\|public 
HoodieCleanConfig' {} | head -20
   
   echo
   echo "=== Verify the build() method in HoodieCleanConfig ==="
   fd -i "HoodieCleanConfig.java" --exec sed -n '433,445p' {}
   ```
   
   Repository: yihua/hudi
   
   Length of output: 119
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Find the getLong implementation in config hierarchy ==="
   fd -i "HoodieConfig.java" --exec grep -n 'public long getLong' {} 
   
   echo
   echo "=== Check HoodieWriteConfig build method and validation ==="
   fd -i "HoodieWriteConfig.java" --exec sed -n '3789,3800p' {}
   
   echo
   echo "=== Full HoodieCleanConfig build method ==="
   fd -i "HoodieCleanConfig.java" --exec sed -n '433,450p' {}
   
   echo
   echo "=== Check if value is parsed/validated anywhere in build or getter ==="
   rg -n 'parseLong|Long.parseLong' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/ 
--type=java | grep -i 'empty\|clean' | head -10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 113
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Search for AbstractHoodieConfig or base config class ==="
   fd 'AbstractHoodieConfig.java|HoodieConfig.java' --type f
   
   echo
   echo "=== Get full HoodieCleanConfig build method ==="
   cat 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
 | sed -n '433,445p'
   
   echo
   echo "=== Check all references to getLong in config directory ==="
   find hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config 
-name '*.java' -exec grep -l 'getLong' {} \;
   
   echo
   echo "=== Search for where config value is actually retrieved (check if 
Long.parseLong is used) ==="
   rg 'Long.parseLong.*MAX_DURATION|getLong.*MAX_DURATION' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/ --type=java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1138
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euoripefail
   
   echo "=== Check HoodieConfig.getLong implementation ==="
   sed -n '1,500p' 
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java | 
grep -A 10 'public.*getLong'
   
   echo
   echo "=== Verify how getLong handles Long.parseLong ==="
   rg -n 'Long.parseLong' 
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java -A 3
   
   echo
   echo "=== Check if validation happens on config build in HoodieWriteConfig 
==="
   sed -n '3789,3810p' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
   
   echo
   echo "=== Verify if properties file can bypass builder validation ==="
   rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
 -B 5 -A 5 | head -30
   ```
   
   Repository: yihua/hudi
   
   Length of output: 728
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Check HoodieConfig.getLong implementation ==="
   sed -n '1,500p' 
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java | 
grep -A 10 'public.*getLong'
   
   echo
   echo "=== Verify how getLong handles Long.parseLong ==="
   rg -n 'Long.parseLong' 
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieConfig.java -A 3
   
   echo
   echo "=== Check if validation happens on config build in HoodieWriteConfig 
==="
   sed -n '3789,3810p' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
   
   echo
   echo "=== Verify if properties file can bypass builder validation ==="
   rg -n 'MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java
 -B 5 -A 5 | head -30
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3584
   
   ---
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   set -euo pipefail
   
   echo "=== Check HoodieWriteConfig.validate() method to see if it validates 
MAX_DURATION ==="
   rg -n 'private void validate' 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 -A 100 | head -150
   ```
   
   Repository: yihua/hudi
   
   Length of output: 5864
   
   ---
   
   
   
   </details>
   
   **Consider adding validation for `MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS` to 
reject invalid ranges.**
   
   Line 429 accepts any `long` value without validation. While downstream code 
at line 251 safely gates the feature with 
`config.maxDurationToCreateEmptyCleanMs() > 0`, adding explicit validation 
would improve clarity and prevent silent misconfigurations. Enforce `duration 
== -1 || duration >= 0` either in the builder setter (line 428) or the 
`build()` method (line 433) to match the documented semantics.
   
   The proposed fix remains valid:
   
   <details>
   <summary>🔧 Suggested fix</summary>
   
   ```diff
       public HoodieCleanConfig.Builder withMaxDurationToCreateEmptyClean(long 
duration) {
   +     if (duration < -1) {
   +       throw new IllegalArgumentException(
   +           "hoodie.write.empty.clean.create.duration.ms must be -1 
(disabled) or >= 0");
   +     }
         cleanConfig.setValue(MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS, 
String.valueOf(duration));
         return this;
       }
   ```
   </details>
   
   Also applies to: 433-439 (add similar validation to `build()` method to 
catch property-file configurations)
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java`
   around lines 247 - 257, Add explicit validation for
   MAX_DURATION_TO_CREATE_EMPTY_CLEAN_MS by enforcing the value is either -1 or 
>=
   0; in the builder setter method maxDurationToCreateEmptyCleanMs(...) validate
   the incoming long and throw an IllegalArgumentException for invalid values, 
and
   add the same check in the build() method to catch property-file or 
deserialized
   configurations before constructing the HoodieCleanConfig instance.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:poseidon:hawk:34ef15c8-abad-4f45-af37-5f628ba46abc -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979372)) 
(source:comment#3066979372)



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieCleanerTestBase.java:
##########
@@ -100,18 +100,23 @@ protected List<HoodieCleanStat> runCleaner(
     return runCleaner(config, simulateRetryFailure, simulateMetadataFailure, 
1, false);
   }
 
+  protected List<HoodieCleanStat> runCleaner(
+      HoodieWriteConfig config, boolean simulateRetryFailure, boolean 
simulateMetadataFailure,
+      Integer firstCommitSequence, boolean needInstantInHudiFormat) throws 
IOException {
+    SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
+    String cleanInstantTs = needInstantInHudiFormat ? 
makeNewCommitTime(firstCommitSequence, "%014d") : 
makeNewCommitTime(firstCommitSequence, "%09d");
+    return runCleaner(config, simulateRetryFailure, simulateMetadataFailure, 
writeClient, cleanInstantTs);
+  }

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Close the write client in this overload.**
   
   Line 106 allocates a `SparkRDDWriteClient` and returns without closing it. 
Repeated helper calls will leak client resources and can make the cleaner tests 
flaky.
   
   <details>
   <summary>♻️ Minimal fix</summary>
   
   ```diff
      protected List<HoodieCleanStat> runCleaner(
          HoodieWriteConfig config, boolean simulateRetryFailure, boolean 
simulateMetadataFailure,
          Integer firstCommitSequence, boolean needInstantInHudiFormat) throws 
IOException {
   -    SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
   -    String cleanInstantTs = needInstantInHudiFormat ? 
makeNewCommitTime(firstCommitSequence, "%014d") : 
makeNewCommitTime(firstCommitSequence, "%09d");
   -    return runCleaner(config, simulateRetryFailure, 
simulateMetadataFailure, writeClient, cleanInstantTs);
   +    String cleanInstantTs = needInstantInHudiFormat
   +        ? makeNewCommitTime(firstCommitSequence, "%014d")
   +        : makeNewCommitTime(firstCommitSequence, "%09d");
   +    try (SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config)) 
{
   +      return runCleaner(config, simulateRetryFailure, 
simulateMetadataFailure, writeClient, cleanInstantTs);
   +    }
      }
   ```
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieCleanerTestBase.java`
   around lines 103 - 109, The overload of runCleaner creates a 
SparkRDDWriteClient
   via getHoodieWriteClient(config) and returns without closing it; update this
   method (the runCleaner overload that builds cleanInstantTs) to ensure the
   allocated writeClient is closed in all cases by calling writeClient.close() 
(or
   try-with-resources/try-finally) after delegating to the other 
runCleaner(String
   cleanInstantTs, SparkRDDWriteClient<?> writeClient, ...) so the client is 
always
   closed even on exceptions; reference the methods/variables: runCleaner(...),
   getHoodieWriteClient, writeClient, makeNewCommitTime, and ensure the close
   happens after runCleaner returns or in finally.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979380)) 
(source:comment#3066979380)



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java:
##########
@@ -238,6 +241,20 @@ private HoodieCleanMetadata runClean(HoodieTable<T, I, K, 
O> table, HoodieInstan
     }
   }
 
+  private static HoodieCleanMetadata 
createEmptyCleanMetadata(HoodieCleanerPlan cleanerPlan, HoodieInstant 
inflightInstant, long timeTakenMillis) {
+    return HoodieCleanMetadata.newBuilder()
+        .setStartCleanTime(inflightInstant.requestedTime())
+        .setTimeTakenInMillis(timeTakenMillis)
+        .setTotalFilesDeleted(0)
+        
.setLastCompletedCommitTimestamp(cleanerPlan.getLastCompletedCommitTimestamp())
+        
.setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain().getTimestamp())

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Potential NullPointerException on 
`getEarliestInstantToRetain()`**
   
   `createEmptyCleanMetadata` is reached whenever `cleanStats.isEmpty()`. 
`cleanStats` is built by streaming over 
`cleanerPlan.getFilePathsToBeDeletedPerPartition().keySet()`, so it is empty 
when the partition-file map is empty — which is exactly the case for any plan 
that has *only* `partitionsToBeDeleted` entries with no file paths.
   
   In the regular (non-empty-clean-commit) code path inside 
`requestClean(context)`:
   
   ```java
   return new HoodieCleanerPlan(
       earliestInstant.map(...).orElse(null),   // ← null when no earliest 
instant
       ...
   );
   ```
   
   If `earliestInstant` is absent (e.g. `KEEP_LATEST_FILE_VERSIONS` policy with 
no commits beyond the threshold) the plan's `EarliestInstantToRetain` field is 
null. If `partitionsToBeDeleted` is non-empty for that plan and there are no 
file paths, `cleanStats` will be empty, `createEmptyCleanMetadata` will be 
called, and line 250 will NPE:
   
   ```java
   
.setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain().getTimestamp())
   //                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can be null
   ```
   
   ```suggestion
           .setEarliestCommitToRetain(cleanerPlan.getEarliestInstantToRetain() 
!= null
               ? cleanerPlan.getEarliestInstantToRetain().getTimestamp() : null)
   ```
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066975828)) 
(source:comment#3066975828)



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java:
##########
@@ -213,17 +215,18 @@ private HoodieCleanMetadata runClean(HoodieTable<T, I, K, 
O> table, HoodieInstan
       }
 
       List<HoodieCleanStat> cleanStats = clean(context, cleanerPlan);
+      table.getMetaClient().reloadActiveTimeline();
+      HoodieCleanMetadata metadata;
       if (cleanStats.isEmpty()) {
-        return HoodieCleanMetadata.newBuilder().build();
+        metadata = createEmptyCleanMetadata(cleanerPlan, inflightInstant, 
timer.endTimer());
+      } else {
+        metadata = CleanerUtils.convertCleanMetadata(
+            inflightInstant.requestedTime(),
+            Option.of(timer.endTimer()),
+            cleanStats,
+            cleanerPlan.getExtraMetadata()
+        );

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Don't use `cleanStats.isEmpty()` as the empty-clean signal.**
   
   Line 220 now treats every stat-less execution as a no-op clean, but 
`clean()` still deletes `partitionsToBeDeleted` without emitting a 
`HoodieCleanStat` for them. After `CleanPlanActionExecutor` started treating 
partition deletes as real work, a partition-only clean will land here, get 
serialized as an empty clean, and lose its partition-delete metadata.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java`
   around lines 218 - 228, The code in CleanActionExecutor should not treat
   cleanStats.isEmpty() alone as the signal for an empty clean because partition
   deletions (tracked on the CleanerPlan) are real work; change the conditional 
so
   that createEmptyCleanMetadata(...) is used only when both 
cleanStats.isEmpty()
   AND there are no partition deletes recorded on the cleanerPlan (e.g., check
   cleanerPlan.getPartitionsToDelete() or equivalent), otherwise call
   CleanerUtils.convertCleanMetadata(...) as before (passing
   inflightInstant.requestedTime() and Option.of(timer.endTimer())) so
   partition-delete metadata is preserved; update the logic around cleanStats,
   cleanerPlan, createEmptyCleanMetadata, and CleanerUtils.convertCleanMetadata
   accordingly.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979375)) 
(source:comment#3066979375)



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestCleanPlanExecutor.java:
##########
@@ -813,4 +821,202 @@ public void testKeepXHoursWithCleaning(
       testTable.close();
     }
   }
+
+  @ParameterizedTest
+  @ValueSource(booleans = {true, false})
+  void testEmptyCleansAddedAfterThreshold(boolean secondCommitAfterThreshold) 
throws Exception {
+    boolean enableIncrementalClean = true;
+    boolean enableBootstrapSourceClean = false;
+    HoodieWriteConfig config = 
HoodieWriteConfig.newBuilder().withPath(basePath)
+        
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build())
+        .withCleanConfig(HoodieCleanConfig.newBuilder()
+            .withIncrementalCleaningMode(enableIncrementalClean)
+            
.withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.EAGER)
+            .withCleanBootstrapBaseFileEnabled(enableBootstrapSourceClean)
+            
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS).cleanerNumHoursRetained(2)
+            .withMaxDurationToCreateEmptyClean(TimeUnit.MINUTES.toMillis(60))
+            .build())
+        .build();
+
+    HoodieTestTable testTable = HoodieTestTable.of(metaClient);
+    try  {
+      String p0 = "2020/01/01";
+
+      String file1P0C0 = UUID.randomUUID().toString();
+      Instant instant = Instant.now();
+      ZonedDateTime commitDateTime = ZonedDateTime.ofInstant(instant, 
metaClient.getTableConfig().getTimelineTimezone().getZoneId());
+      int minutesForFirstCommit = 180;
+      String firstCommitTs = 
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForFirstCommit).toInstant()));
+
+      commitToTestTable(testTable, firstCommitTs, p0, file1P0C0);
+      testTable = tearDownTestTableAndReinit(testTable, config);
+
+      // make next commit, with 1 insert & 1 update per partition
+      String file2P0C1 = UUID.randomUUID().toString();
+      int minutesForSecondCommit = 150;
+      String secondCommitTs = 
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForSecondCommit).toInstant()));
+      testTable = tearDownTestTableAndReinit(testTable, config);
+
+      commitToTestTable(testTable, secondCommitTs, p0, file2P0C1);
+      testTable = tearDownTestTableAndReinit(testTable, config);
+      metaClient = HoodieTableMetaClient.reload(metaClient);
+
+      // make next commit, with 1 insert per partition
+      int minutesForThirdCommit = 90;
+      String thirdCommitTs = 
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForThirdCommit).toInstant()));
+      String file3P0C2 = UUID.randomUUID().toString();
+
+      testTable = tearDownTestTableAndReinit(testTable, config);
+
+      commitToTestTable(testTable, thirdCommitTs, p0, file3P0C2);
+      testTable = tearDownTestTableAndReinit(testTable, config);
+      metaClient = HoodieTableMetaClient.reload(metaClient);
+
+      // first empty clean can be generated since earliest instant to retain 
will be the first commit (always keep last two instants at a minimum)
+      String firstCleanInstant = 
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minus(secondCommitAfterThreshold
 ? 70 : 30, ChronoUnit.MINUTES).toInstant()));
+
+      SparkRDDWriteClient<?> writeClient = getHoodieWriteClient(config);
+      List<HoodieCleanStat> hoodieCleanStatsThree = runCleaner(config, false, 
false, writeClient, firstCleanInstant);
+      assertEquals(0, hoodieCleanStatsThree.size(), "Must not scan any 
partitions and clean any files");
+      assertEquals(1, 
metaClient.reloadActiveTimeline().getCleanerTimeline().filterCompletedInstants().countInstants());
+      String actualFirst = 
metaClient.getActiveTimeline().getCleanerTimeline().lastInstant().get().requestedTime();
+      writeClient.close();
+
+      String file4P0C1 = UUID.randomUUID().toString();
+      int minutesForFourthCommit = 10;
+      String fourthCommitTs = 
HoodieInstantTimeGenerator.formatDate(Date.from(commitDateTime.minusMinutes(minutesForFourthCommit).toInstant()));
+      testTable = tearDownTestTableAndReinit(testTable, config);
+
+      commitToTestTable(testTable, fourthCommitTs, p0, file4P0C1);
+      testTable = tearDownTestTableAndReinit(testTable, config);
+
+      // add a savepoint
+      getHoodieWriteClient(config).savepoint(fourthCommitTs, "user", 
"comment");

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Close the savepoint client.**
   
   Line 894 creates a write client just to issue a savepoint and never closes 
it. This test already churns through multiple clients, so leaking one here is 
enough to leave background resources behind.
   
   <details>
   <summary>♻️ Minimal fix</summary>
   
   ```diff
   -      getHoodieWriteClient(config).savepoint(fourthCommitTs, "user", 
"comment");
   +      try (SparkRDDWriteClient<?> savepointClient = 
getHoodieWriteClient(config)) {
   +        savepointClient.savepoint(fourthCommitTs, "user", "comment");
   +      }
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
         // add a savepoint
         try (SparkRDDWriteClient<?> savepointClient = 
getHoodieWriteClient(config)) {
           savepointClient.savepoint(fourthCommitTs, "user", "comment");
         }
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestCleanPlanExecutor.java`
   around lines 893 - 894, The test creates a write client just to call 
savepoint
   via getHoodieWriteClient(config).savepoint(...) but never closes that client;
   change the code to obtain the HoodieWriteClient instance (e.g.,
   HoodieWriteClient client = getHoodieWriteClient(config)), call
   client.savepoint(fourthCommitTs, "user", "comment"), and then close the 
client
   (client.close()) or use a try-with-resources block to ensure the write 
client is
   always closed to release background resources.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:a08fb0cb-fca0-4be4-8fd3-028689b3a295 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/31#discussion_r3066979376)) 
(source:comment#3066979376)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to