liurenjie1024 commented on issue #1047:
URL: https://github.com/apache/iceberg-rust/issues/1047#issuecomment-2713343172

   Thanks @ZENOTME for raising this. I think what's missing is a `FileReader` 
which accepts following arguements:
   
   1. File path
   2. File range
   3. Expected schema
   4. Arrow batch size
   
   This reader need to convert files(parquet, orc, avro) into arrow record 
batch, which handles things like missing column, type promotion, etc, which are 
caused by schema evolution.
   
   With this api, it would be easy to implement the `read_data`, 
`read_pos_delete`, `read_eq_delete` you mentioned. But I'm not sure if we 
acutally need to provided these apis. I think the `FileReader` + `FileScanTask` 
has provided enough flexibility for compute engines. For example, it can choose 
to join data file with pos deletions and eq deletions in logical plan, or they 
could choose to implement their own file scan operator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to