[I] Reported and actual arrow schema of the table can be different [iceberg-rust]

via GitHub Tue, 17 Dec 2024 06:50:27 -0800


gruuya opened a new issue, #813:
URL: https://github.com/apache/iceberg-rust/issues/813


   This is related to https://github.com/apache/iceberg-rust/issues/783.
   
   Namely what happens is
   - I use `pyiceberg` to create an Iceberg table from a Parquet file.
   - The Parquet file has type hints for e.g. `DataType::Int16` (`required 
int32 c1 (INTEGER(16,true)) = 1;`).
   - Thanks to https://github.com/apache/iceberg-rust/issues/783 we now upcast 
that to the native 32-bit Int type and can read it.
   - This is also the type returned in e.g. `TableProvider::schema`.
   - However the actual type in the read arrow record batches (inferred from 
the Parquet hint) is now `DataType::Int16`, leading to reported and actual 
schema mismatch.
   - This now leads to a DataFusion query such as `SELECT c1 FROM t where c1 <= 
2` crashing with `Invalid comparison operation: Int16 <= Int32`
   - Ultimately the schema mismatch tricks one of the logical optimizers into 
thinking that if it casts the right side (i.e. the `2` literal) into 
`DataType::Int32` (from the reported schema) the comparison will be fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Reported and actual arrow schema of the table can be different [iceberg-rust]

Reply via email to