zinking commented on code in PR #9724: URL: https://github.com/apache/iceberg/pull/9724#discussion_r1496914794
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ########## @@ -507,4 +645,54 @@ public int totalGroupCount() { return totalGroupCount; } } + + private static class MakeDeleteFile implements MapFunction<Row, DeleteFile> { + + private final boolean posDeletes; + private final Types.StructType partitionType; + private final Map<Integer, PartitionSpec> specsById; + + /** + * Map function that transforms entries table rows into {@link DeleteFile} + * + * @param posDeletes true for position deletes, false for equality deletes + * @param partitionType partition type of table + * @param specsById table's partition specs + */ + MakeDeleteFile( + boolean posDeletes, Types.StructType partitionType, Map<Integer, PartitionSpec> specsById) { + this.posDeletes = posDeletes; + this.partitionType = partitionType; + this.specsById = specsById; + } + + @Override + public DeleteFile call(Row row) throws Exception { + PartitionData partition = new PartitionData(partitionType); + GenericRowWithSchema partitionRow = row.getAs(0); + + for (int i = 0; i < partitionRow.length(); i++) { + partition.set(i, partitionRow.get(i)); + } + + int specId = row.getAs(1); + String path = row.getAs(2); + long fileSize = row.getAs(3); + long recordCount = row.getAs(4); + + FileMetadata.Builder builder = FileMetadata.deleteFileBuilder(specsById.get(specId)); Review Comment: I probably didn't explain it properly. the `partition` variable and the `specsById.get(specId)` doesn't match under schema (partition) evolution. my fix is something like this ``` PartitionSpec partitionSpec = specsById.get(specId); List<String> partitionFields = partitionSpec.fields().stream().map(PartitionField::name).collect(Collectors.toList()); PartitionData specPartData = partition.project(partitionFields); FileMetadata.Builder builder = FileMetadata.deleteFileBuilder(partitionSpec); ``` lastly, like you mentioned, we should make delete(`deletePath`) work and avoid all these overheads. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org