rraulinio opened a new issue, #1190:
URL: https://github.com/apache/iceberg-go/issues/1190

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   ## Problem
   
   Manifest-level pruning can read the wrong partition field summary after a 
partition source column is dropped from the current schema.
   
   Example:
   
   A table is partitioned by two fields in this order:
   
   ```text
   q_part, p_part
   ```
   
   Manifest files store partition summaries positionally, so:
   
   ```text
   slot 0 = q_part summary
   slot 1 = p_part summary
   ```
   
   If the table schema later drops column `q`, iceberg-go currently builds a 
compacted partition schema that only contains:
   
   ```text
   slot 0 = p_part
   ```
   
   But old manifest files still store summaries in the original order:
   
   ```text
   slot 0 = q_part
   slot 1 = p_part
   ```
   
   So a query like:
   
   ```sql
   WHERE p = 2
   ```
   
   can accidentally compare `2` against `q_part`'s summary instead of 
`p_part`'s summary. If `q_part` has bounds like `10..20`, the manifest 
evaluator may conclude that `p = 2` cannot match and skip the manifest, even 
though matching data files exist.
   
   ## Expected Behavior
   
   Dropping a source column should not shift the positions of later partition 
fields during manifest pruning.
   
   Dropped-source partition fields should stay as inert placeholders, so 
manifest summary positions remain aligned with the partition spec:
   
   ```text
   slot 0 = dropped q_part placeholder
   slot 1 = p_part
   ```
   
   This matches Java Iceberg's partition type behavior, where dropped 
partition-source fields are represented with `UnknownType` in the partition 
type.
   
   ## Impact
   
   This can cause incorrect query results because matching manifests may be 
pruned incorrectly.
   
   ## Proposed Fix
   
   When building the partition schema used for manifest evaluation, keep every 
partition field from the partition spec. If the source column no longer exists 
in the scan schema, use an `UnknownType` placeholder for that partition field 
instead of dropping it.
   
   Add a regression test with:
   
   - partition spec: `identity(q), identity(p)`
   - current schema after evolution: `q` dropped, `p` still present
   - row filter: `p = 2`, projected to the partition filter for `p_part`
   
   Before the fix, the matching manifest can be incorrectly pruned. After the 
fix, the matching file is returned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to