KevinJiao opened a new pull request, #2685:
URL: https://github.com/apache/iceberg-python/pull/2685
Closes #2672
# Rationale for this change
When performing column projection on partitioned tables with schema
evolution, PyIceberg incorrectly uses the projected schema (containing only
selected columns) instead
of the full table schema when building partition types in
`_get_column_projection_values()`. This causes `ValueError: Could not find
field with id: X` when:
1. Reading from partitioned Iceberg tables
2. Using column projection (selecting specific columns, not `SELECT *`)
3. Selected columns do NOT include the partition field(s)
4. The table has undergone schema evolution (fields added/removed after
initial creation)
5. Reading files that are missing some of the selected columns (written
before schema evolution)
The root cause is where `partition_spec.partition_type(projected_schema)`
fails because the projected schema may be missing fields that
exist in the partition specification.
The fix passes the full table schema from
`ArrowScan._table_metadata.schema()` through `_task_to_record_batches()` to
`_get_column_projection_values()`, ensuring all fields are available when
building partition accessors.
## Are these changes tested?
Yes. Added a test `test_partition_column_projection_with_schema_evolution`
that:
- Creates a partitioned table with initial schema
- Writes data with the initial schema
- Evolves the schema by adding a new column
- Writes data with the evolved schema
- Performs column projection that excludes the partition field
## Are there any user-facing changes?
No. Only internal helpers are changed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]