JonasJ-ap commented on code in PR #6880:
URL: https://github.com/apache/iceberg/pull/6880#discussion_r1117977464


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -213,6 +218,52 @@ private PartitionSpec 
getPartitionSpecFromDeltaSnapshot(Schema schema) {
     return builder.build();
   }
 
+  /**
+   * Commit the initial delta snapshot to iceberg transaction. It tries the 
snapshot starting from
+   * {@code deltaStartVersion} to {@code latestVersion} and commit the first 
constructable one.
+   *
+   * <p>There are two cases that the delta snapshot is not constructable:
+   *
+   * <ul>
+   *   <li>the version is earlier than the earliest checkpoint
+   *   <li>the corresponding data files are deleted by {@code VACUUM}
+   * </ul>
+   *
+   * <p>For more information, please refer to delta lake's <a
+   * href="https://docs.delta.io/latest/delta-batch.html#-data-retention";>Data 
Retention</a>
+   *
+   * @param latestVersion the latest version of the delta lake table
+   * @param transaction the iceberg transaction
+   * @return the initial version of the delta lake table that is successfully 
committed to iceberg
+   */
+  private long commitInitialDeltaSnapshotToIcebergTransaction(
+      long latestVersion, Transaction transaction) {
+    long constructableStartVersion = deltaStartVersion;
+    while (constructableStartVersion <= latestVersion) {
+      try {
+        List<AddFile> initDataFiles =
+            
deltaLog.getSnapshotForVersionAsOf(constructableStartVersion).getAllFiles();
+        List<DataFile> filesToAdd = Lists.newArrayList();
+        for (AddFile addFile : initDataFiles) {
+          DataFile dataFile = buildDataFileFromAction(addFile, 
transaction.table());
+          filesToAdd.add(dataFile);
+        }
+
+        // AppendFiles case
+        AppendFiles appendFiles = transaction.newAppend();
+        filesToAdd.forEach(appendFiles::appendFile);
+        appendFiles.commit();
+
+        return constructableStartVersion;
+      } catch (NotFoundException | IllegalArgumentException | 
DeltaStandaloneException e) {

Review Comment:
   Thank you for pointing this out. The `HadoopFileIO` re-throw 
`FileNotFoundExceptiion` as `NotFoundException` and general `IOException` as 
`RuntimeIOException`.
   
   
https://github.com/apache/iceberg/blob/b5a31a14d56c1ee24bad87e1ac7f119d638ee320/core/src/main/java/org/apache/iceberg/hadoop/HadoopInputFile.java#L159-L170
   
   In this case, I think we only need to handle the `NotFoundException` since 
`VACUUM` may delete the data file. I've added a code block in 
`buildDataFileFromAction` to explicitly check file existence and throw 
`NotFoundException` if necessary.
   
   Please let me know if you have any other concern here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to