pedorro opened a new issue, #11650: URL: https://github.com/apache/iceberg/issues/11650
### Apache Iceberg version 1.7.0 (latest release) ### Query engine Athena ### Please describe the bug 🐞 When both of the following criteria are met, queries for a renamed column return nulls instead of the original values: 1. The data is in an existing Parquet file that is 'appended' to the table (using the Java SDK `appendFile()` function) 2. The existing Parquet file was created by something _other_ than Iceberg As a contrived example, assume there exists in S3 a Parquet file with three columns and one row. This Parquet file is written by something other than the Iceberg libs (e.g. Apache Parquet lib v1.14.4). This file is appended to a new (empty) Iceberg table called `example_table`. Using the Java SDK: ``` table.newAppend().appendFile(existingParquet).commit(); ``` Querying (in AWS Athena) from the table now returns the single row. ``` select * from example_table; | first | second | third | +-------+--------+-------+ | aaa | bbb | ccc | ``` Now rename column `second` to `renamed`. Using the Java SDK: ``` table.updateSchema().renameColumn("second", "renamed").commit(); ``` Querying from the table _now_ returns the single row, but without the value in column `renamed`: ``` select * from example_table; | first | renamed | third | +-------+---------+-------+ | aaa | | ccc | ``` This is true when queried by both AWS Athena & Redshift Spectrum. Interestingly, if the existing Parquet file _was_ originally created by Iceberg (either via an `insert` query or using the Java SDK), this issue does not present. In that case, a query for the renamed column _does_ return the correct (original) value. Even if the Iceberg-created Parquet file is copied or moved from its original location before being appended to a new table, the column-rename works as expected. This suggests there is some unique (non-standard?) quality to Parquet files created by Iceberg, and the column rename operation relies on it. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org