rraulinio opened a new issue, #1190: URL: https://github.com/apache/iceberg-go/issues/1190
### Apache Iceberg version main (development) ### Please describe the bug 🐞 ## Problem Manifest-level pruning can read the wrong partition field summary after a partition source column is dropped from the current schema. Example: A table is partitioned by two fields in this order: ```text q_part, p_part ``` Manifest files store partition summaries positionally, so: ```text slot 0 = q_part summary slot 1 = p_part summary ``` If the table schema later drops column `q`, iceberg-go currently builds a compacted partition schema that only contains: ```text slot 0 = p_part ``` But old manifest files still store summaries in the original order: ```text slot 0 = q_part slot 1 = p_part ``` So a query like: ```sql WHERE p = 2 ``` can accidentally compare `2` against `q_part`'s summary instead of `p_part`'s summary. If `q_part` has bounds like `10..20`, the manifest evaluator may conclude that `p = 2` cannot match and skip the manifest, even though matching data files exist. ## Expected Behavior Dropping a source column should not shift the positions of later partition fields during manifest pruning. Dropped-source partition fields should stay as inert placeholders, so manifest summary positions remain aligned with the partition spec: ```text slot 0 = dropped q_part placeholder slot 1 = p_part ``` This matches Java Iceberg's partition type behavior, where dropped partition-source fields are represented with `UnknownType` in the partition type. ## Impact This can cause incorrect query results because matching manifests may be pruned incorrectly. ## Proposed Fix When building the partition schema used for manifest evaluation, keep every partition field from the partition spec. If the source column no longer exists in the scan schema, use an `UnknownType` placeholder for that partition field instead of dropping it. Add a regression test with: - partition spec: `identity(q), identity(p)` - current schema after evolution: `q` dropped, `p` still present - row filter: `p = 2`, projected to the partition filter for `p_part` Before the fix, the matching manifest can be incorrectly pruned. After the fix, the matching file is returned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
