Vino1016 commented on issue #12514:
URL: https://github.com/apache/iceberg/issues/12514#issuecomment-2795894546

   > **Library Upgrade -**
   > 
   > We don't have a specific upgrade guide here because the library is 
attempting to be as independent from the format as possible. This means that if 
your code compiles and runs with the new version of the library, we expect it 
to work identically with your table.
   > 
   > So your main checks willl be around configuration changes and other 
performance sorts of things that have changed, but under the hood your table 
won't change.
   > 
   > **File Format Switch -**
   > 
   > You can have files in multiple formats in the same table, there shouldn't 
be anything you need to do.
   > 
   > **Catalog Migration**
   > 
   > This much more complicated. Doing this will require changing all of your 
clients to point to glue instead of using path based table access. This is most 
important for writers, when you have multiple catalogs working with the same 
metadata.json you essentially enter into a split-brain situation. I would 
probably make the glue table "read only" and use the "registerTable" api to set 
the current hadoop based metadata.json as the current table path using another 
process. Eventually when I was ready to switch over writers, I would stop the 
sync and point the writers at Glue.
   
   Thank you for your response. Through our testing, we found that while Glue 
Crawler could potentially address metadata management, we have concerns about 
the significant risks this might pose to our existing data. We've tentatively 
decided to upgrade to version 1.4.0 while retaining the original Hadoop Catalog 
instead of adopting Glue.
   
   Regarding your point about Iceberg tables supporting both ORC and Parquet 
formats without additional adaptation, we observed that the file format is 
determined by the table properties set during table creation. However, we 
noticed that the Iceberg API does not provide a way to modify these properties 
after table creation. Could you clarify if there's a recommended approach to 
dynamically adjust the write format configuration (e.g., switching new data 
writes to Parquet while retaining existing ORC files), or would this require 
creating a new table with the desired properties?
   
   Any insights would be greatly appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to