kevinjqliu commented on issue #1401:
URL: 
https://github.com/apache/iceberg-python/issues/1401#issuecomment-2543350860

   That makes sense to me. I think we generally need a place to replicate the 
[column projection logic according to the 
spec](https://iceberg.apache.org/spec/#column-projection).
   Currently, on the read path, the only projection done is to prune columns 
https://github.com/apache/iceberg-python/blob/a97d13c17cd03f86252b9df2c65532ec45fb05da/pyiceberg/io/pyarrow.py#L1246
   
   > By comparing the projected schema vs the file projection schema
   
   yea the issue occurs when the table schema has fields that are not present 
in the file schema. 
   From the spec:
   ```
   Values for field ids which are not present in a data file must be resolved 
according the following rules 
   ```
   
   > Check if the data file partition struct contains that partition field 
(check by name)
   
   We don't need this extra check since the table/file schema mismatch will 
tell us which columns are missing. Also we'd always want to check by field id 
   From the spec
   ```
   Columns in Iceberg data files are selected by field id.
   ```
   
   > Try to inject this new column in the resultant RecordBatch
   
   Yea we'd want to append whatever the value is to the data file records. 
Luckily arrow is columnar so there wont be much penalty. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to