aokolnychyi commented on code in PR #9724: URL: https://github.com/apache/iceberg/pull/9724#discussion_r1498618397
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ########## @@ -507,4 +645,54 @@ public int totalGroupCount() { return totalGroupCount; } } + + private static class MakeDeleteFile implements MapFunction<Row, DeleteFile> { + + private final boolean posDeletes; + private final Types.StructType partitionType; + private final Map<Integer, PartitionSpec> specsById; + + /** + * Map function that transforms entries table rows into {@link DeleteFile} + * + * @param posDeletes true for position deletes, false for equality deletes + * @param partitionType partition type of table + * @param specsById table's partition specs + */ + MakeDeleteFile( + boolean posDeletes, Types.StructType partitionType, Map<Integer, PartitionSpec> specsById) { + this.posDeletes = posDeletes; + this.partitionType = partitionType; + this.specsById = specsById; + } + + @Override + public DeleteFile call(Row row) throws Exception { + PartitionData partition = new PartitionData(partitionType); + GenericRowWithSchema partitionRow = row.getAs(0); + + for (int i = 0; i < partitionRow.length(); i++) { + partition.set(i, partitionRow.get(i)); + } + + int specId = row.getAs(1); + String path = row.getAs(2); + long fileSize = row.getAs(3); + long recordCount = row.getAs(4); + + FileMetadata.Builder builder = FileMetadata.deleteFileBuilder(specsById.get(specId)); Review Comment: Deleting based on path is not a good idea as Iceberg won't be able to prune manifests using partition info. The action for rewriting manifests already handles this, we can use a similar approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org