Re: [I] Reported and actual arrow schema of the table can be different [iceberg-rust]

via GitHub Wed, 18 Dec 2024 00:41:31 -0800


gruuya commented on issue #813:
URL: https://github.com/apache/iceberg-rust/issues/813#issuecomment-2550698180


   I see two ways of resolving this
   
   1. Align the schema returned from `TableProvider::schema` to match the one 
from the Parquet files. I dislike this approach as it feels hacky. For one, the 
canonical arrow schema for a given table will still be different than what 
`TableProvider::schema` returns, and moreover this would necessitate reading 
the Parquet file metadata when instantiating the table (and it's also 
vulnerable to schema drift across Parquet files in different table versions).
   2. Coerce the streamed batches from the scan to match the canonical arrow 
schema 
             - one way to do this is by instructing arrow-rs itself to perform 
the casting automatically as is suggested in #814
             - another might be to do it via the `RecordBatchTransformer`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Reported and actual arrow schema of the table can be different [iceberg-rust]

Reply via email to