RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1826303884

   You can only guarantee this is safe for your data, for any other user this 
could be unsafe. That’s the underlying issue with this PR, we are essentially 
allowing a cast binary as string.Sent from my iPhoneOn Nov 24, 2023, at 4:47 
AM, fengjiajie ***@***.***> wrote:
   
   I'm also a little nervous about this change, how are we guaranteed that the 
binary is parsable as UTF8 bytes? Seems like we should just be fixing the type 
annotations rather than changing our readers to read files that have been 
written incorrectly?
   
   @RussellSpitzer  Hi, can you please tell if this issue can be moved forward?
   We have a lot of hive tables that contain such parquet files and we are 
trying to convert these hive tables into iceberg tables, this process of 
parquet files cannot be rewritten (because of the large number of history 
files).
   We can guarantee that it could be parsed in UTF-8 because the data was 
originally defined as a string in hive.
   If it wasn't a string before, there's no reason defining it as a string when 
defining the iceberg table would make it fail to parse.
   
   —Reply to this email directly, view it on GitHub, or unsubscribe.You are 
receiving this because you were mentioned.Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to