fengjiajie commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1825487523
> I'm also a little nervous about this change, how are we guaranteed that the binary is parsable as UTF8 bytes? Seems like we should just be fixing the type annotations rather than changing our readers to read files that have been written incorrectly? @RussellSpitzer Hi, can you please tell if this issue can be moved forward? We have a lot of hive tables that contain such parquet files and we are trying to convert these hive tables into iceberg tables, this process of parquet files cannot be rewritten (because of the large number of history files). We can guarantee that it could be parsed in UTF-8 because the data was originally defined as a string in hive. If it wasn't a string before, there's no reason defining it as a string when defining the iceberg table would make it fail to parse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org