huaxingao commented on code in PR #15470:
URL: https://github.com/apache/iceberg/pull/15470#discussion_r3230072777


##########
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java:
##########
@@ -484,6 +510,31 @@ private static RewriteResult<DeleteFile> 
writeDeleteFileEntry(
     }
   }
 
+  private static long rewriteOrReuseStagedPositionDeleteFile(
+      DeleteFile file,
+      String stagingPath,
+      FileIO io,
+      PartitionSpec spec,
+      String sourcePrefix,
+      String targetPrefix,
+      PositionDeleteReaderWriter posDeleteReaderWriter) {
+    OutputFile outputFile = io.newOutputFile(stagingPath);
+    try {
+      return rewritePositionDeleteFile(
+          file, outputFile, io, spec, sourcePrefix, targetPrefix, 
posDeleteReaderWriter);
+    } catch (IOException e) {
+      throw new UncheckedIOException(
+          "Failed to rewrite position delete file " + file.location(), e);
+    } catch (UncheckedIOException e) {
+      // Another task in this Spark job already staged this file. Rewriting is 
deterministic, so
+      // its content (and therefore length) match what this task would have 
produced.
+      if (e.getCause() instanceof FileAlreadyExistsException) {
+        return io.newInputFile(stagingPath).getLength();

Review Comment:
    @steveloughran flagged the HDFS in-progress-write issue earlier: 
   ```
   There's actually something bad with hdfs here where the length of 
in-progress-writes can underreport length...it's only after close() that 
everything syncs up.
   ```
   
   I think the concern applies here too. If Task A is still writing when Task B 
reaches getLength(), B reads an in‑progress length on HDFS, then you will get 
the same class of bug this PR is fixing. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to