szehon-ho commented on code in PR #10020: URL: https://github.com/apache/iceberg/pull/10020#discussion_r1540213729
########## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ########## @@ -141,25 +147,88 @@ private Schema calculateSchema() { } } + /** + * Handle collisions between table and partition field ids, as both need to be part of position + * deletes table + * + * @param tableSchema original table schema + * @param partitionType original table's partition type + * @return partition type with reassigned field ids + */ + public static Types.StructType partitionType(Schema tableSchema, Types.StructType partitionType) { Review Comment: These and following are just helper methods to reasisgn partition field ids to prevent collision with schema's field ids. ########## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ########## @@ -196,15 +265,14 @@ protected List<String> scanColumns() { */ public BatchScan baseTableFilter(Expression expr) { return new PositionDeletesBatchScan( - table(), schema(), context(), Expressions.and(baseTableFilter, expr)); + table(), schema(), context(), Expressions.and(baseTableFilter, expr), fieldMap); } @Override protected CloseableIterable<ScanTask> doPlanFiles() { String schemaString = SchemaParser.toJson(tableSchema()); - - // prepare transformed partition specs and caches - Map<Integer, PartitionSpec> transformedSpecs = transformSpecs(tableSchema(), table().specs()); + Map<Integer, PartitionSpec> transformedSpecs = Review Comment: We use the transformed spec to evaluate the filters, so we need to bind against the new (reassigned) field ids, hence passing the field id map here. ########## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ########## @@ -278,28 +344,38 @@ public void close() throws IOException { @Override public CloseableIterator<ScanTask> iterator() { + // Partition filter by base table filter Expression partitionFilter = Projections.inclusive(spec, isCaseSensitive()).project(baseTableFilter); - // Filter partitions + // Read manifests (use original table's partition ids to de-serialize partition values) CloseableIterable<ManifestEntry<DeleteFile>> deleteFileEntries = - ManifestFiles.readDeleteManifest(manifest, table().io(), transformedSpecs) + ManifestFiles.readDeleteManifest(manifest, table().io(), table().specs()) Review Comment: Here is a bit tricky. We need to read manifests using the original field ids, so go back to use the original table's partition spec which preserves the original field ids. We can no longer use the spec to bind the user-provided partition filter though, because now those field ids are the old ones and do not match the reasisgned ones, so we need to split that out and evaluate it separately outside the ManifestReader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org