Re: [I] API table.scan does not conform to Iceberg spec for identity partition columns [iceberg-python]

via GitHub Sat, 14 Dec 2024 13:36:10 -0800


kevinjqliu commented on issue #1401:
URL: 
https://github.com/apache/iceberg-python/issues/1401#issuecomment-2543350860

That makes sense to me. I think we generally need a place to replicate the
[column projection logic according to the
spec](https://iceberg.apache.org/spec/#column-projection).
Currently, on the read path, the only projection done is to prune columns
https://github.com/apache/iceberg-python/blob/a97d13c17cd03f86252b9df2c65532ec45fb05da/pyiceberg/io/pyarrow.py#L1246

> By comparing the projected schema vs the file projection schema

yea the issue occurs when the table schema has fields that are not present
in the file schema.
From the spec:
```
Values for field ids which are not present in a data file must be resolved
according the following rules
```

> Check if the data file partition struct contains that partition field
(check by name)

We don't need this extra check since the table/file schema mismatch will
tell us which columns are missing. Also we'd always want to check by field id
From the spec
```
Columns in Iceberg data files are selected by field id.
```

> Try to inject this new column in the resultant RecordBatch

Yea we'd want to append whatever the value is to the data file records.
Luckily arrow is columnar so there wont be much penalty.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] API table.scan does not conform to Iceberg spec for identity partition columns [iceberg-python]

Reply via email to