Re: [I] Seeking Guidance on Upgrading Iceberg from 0.14.0 to 1.8.0 with Catalog and File Format Migration [iceberg]

via GitHub Thu, 10 Apr 2025 22:48:14 -0700


Vino1016 commented on issue #12514:
URL: https://github.com/apache/iceberg/issues/12514#issuecomment-2795893690


   > **Library Upgrade -**
   > 
   > We don't have a specific upgrade guide here because the library is 
attempting to be as independent from the format as possible. This means that if 
your code compiles and runs with the new version of the library, we expect it 
to work identically with your table.
   > 
   > So your main checks willl be around configuration changes and other 
performance sorts of things that have changed, but under the hood your table 
won't change.
   > 
   > **File Format Switch -**
   > 
   > You can have files in multiple formats in the same table, there shouldn't 
be anything you need to do.
   > 
   > **Catalog Migration**
   > 
   > This much more complicated. Doing this will require changing all of your 
clients to point to glue instead of using path based table access. This is most 
important for writers, when you have multiple catalogs working with the same 
metadata.json you essentially enter into a split-brain situation. I would 
probably make the glue table "read only" and use the "registerTable" api to set 
the current hadoop based metadata.json as the current table path using another 
process. Eventually when I was ready to switch over writers, I would stop the 
sync and point the writers at Glue.
   
   
   
   > **图书馆升级 -**
   > 
   > 我们这里没有具体的升级指南，因为该库试图尽可能地独立于格式。这意味着，如果您的代码使用新版本的库编译并运行，我们期望它与您的表格完全相同。
   > 
   > 因此，您的主要检查将围绕配置更改和其他已更改的性能类型 ，但在底层您的表不会改变。
   > 
   > **文件格式切换 -**
   > 
   > 您可以在同一个表中拥有多种格式的文件，您不需要做任何事情。
   > 
   > **目录迁移**
   > 
   > 这要复杂得多。这样做需要将所有客户端改为指向 Glue，而不是使用基于路径的表访问。这对于写入器来说至关重要，因为当多个目录使用同一个 
metadata.json 时，您实际上会陷入脑裂的情况。我可能会将 Glue 表设置为“只读”，并使用“registerTable”API 将当前基于 
Hadoop 的 metadata.json 设置为使用另一个进程的当前表路径。最终，当我准备切换写入器时，我会停止同步并将写入器指向 Glue。
   
   Thank you for your response. Through our testing, we found that while Glue 
Crawler could potentially address metadata management, we have concerns about 
the significant risks this might pose to our existing data. We've tentatively 
decided to upgrade to version 1.4.0 while retaining the original Hadoop Catalog 
instead of adopting Glue.
   
   Regarding your point about Iceberg tables supporting both ORC and Parquet 
formats without additional adaptation, we observed that the file format is 
determined by the table properties set during table creation. However, we 
noticed that the Iceberg API does not provide a way to modify these properties 
after table creation. Could you clarify if there's a recommended approach to 
dynamically adjust the write format configuration (e.g., switching new data 
writes to Parquet while retaining existing ORC files), or would this require 
creating a new table with the desired properties?
   
   Any insights would be greatly appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Seeking Guidance on Upgrading Iceberg from 0.14.0 to 1.8.0 with Catalog and File Format Migration [iceberg]

Reply via email to