ZENOTME commented on issue #1047:
URL: https://github.com/apache/iceberg-rust/issues/1047#issuecomment-2722967864

   > Thanks [@ZENOTME](https://github.com/ZENOTME) for raising this. I think 
what's missing is a `FileReader` which accepts following arguements:
   > 
   > 1. File path
   > 2. File range
   > 3. Expected schema
   > 4. Arrow batch size
   > 
   > This reader need to convert files(parquet, orc, avro) into arrow record 
batch, which handles things like missing column, type promotion, etc, which are 
caused by schema evolution.
   > 
   > With this api, it would be easy to implement the `read_data`, 
`read_pos_delete`, `read_eq_delete` you mentioned. But I'm not sure if we 
acutally need to provided these apis. I think the `FileReader` + `FileScanTask` 
has provided enough flexibility for compute engines. For example, it can choose 
to join data file with pos deletions and eq deletions in logical plan, or they 
could choose to implement their own file scan operator.
   
   In this design, does `ArrowReader` reuse `FileReader`? 
   - If so, I think we may need to refactor some logic of `ArrowReader`
   - Otherwise, `FileReader` is an independent component and it may be more 
convenient to maintain. 
   
   And for delete file(pos delete, equality delete), do we need to handle 
things like missing column, type promotion? 🤔 Seems for pos delete and eq 
delete without value, we can't fulfill the value if they miss. So in here we 
may need the `read_data`, `read_pos_delete`, `read_eq_delete` to separate the 
handle way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to