Vino1016 commented on issue #12514: URL: https://github.com/apache/iceberg/issues/12514#issuecomment-2795893690
> **Library Upgrade -** > > We don't have a specific upgrade guide here because the library is attempting to be as independent from the format as possible. This means that if your code compiles and runs with the new version of the library, we expect it to work identically with your table. > > So your main checks willl be around configuration changes and other performance sorts of things that have changed, but under the hood your table won't change. > > **File Format Switch -** > > You can have files in multiple formats in the same table, there shouldn't be anything you need to do. > > **Catalog Migration** > > This much more complicated. Doing this will require changing all of your clients to point to glue instead of using path based table access. This is most important for writers, when you have multiple catalogs working with the same metadata.json you essentially enter into a split-brain situation. I would probably make the glue table "read only" and use the "registerTable" api to set the current hadoop based metadata.json as the current table path using another process. Eventually when I was ready to switch over writers, I would stop the sync and point the writers at Glue. > **图书馆升级 -** > > 我们这里没有具体的升级指南,因为该库试图尽可能地独立于格式。这意味着,如果您的代码使用新版本的库编译并运行,我们期望它与您的表格完全相同。 > > 因此,您的主要检查将围绕配置更改和其他已更改的性能类型 ,但在底层您的表不会改变。 > > **文件格式切换 -** > > 您可以在同一个表中拥有多种格式的文件,您不需要做任何事情。 > > **目录迁移** > > 这要复杂得多。这样做需要将所有客户端改为指向 Glue,而不是使用基于路径的表访问。这对于写入器来说至关重要,因为当多个目录使用同一个 metadata.json 时,您实际上会陷入脑裂的情况。我可能会将 Glue 表设置为“只读”,并使用“registerTable”API 将当前基于 Hadoop 的 metadata.json 设置为使用另一个进程的当前表路径。最终,当我准备切换写入器时,我会停止同步并将写入器指向 Glue。 Thank you for your response. Through our testing, we found that while Glue Crawler could potentially address metadata management, we have concerns about the significant risks this might pose to our existing data. We've tentatively decided to upgrade to version 1.4.0 while retaining the original Hadoop Catalog instead of adopting Glue. Regarding your point about Iceberg tables supporting both ORC and Parquet formats without additional adaptation, we observed that the file format is determined by the table properties set during table creation. However, we noticed that the Iceberg API does not provide a way to modify these properties after table creation. Could you clarify if there's a recommended approach to dynamically adjust the write format configuration (e.g., switching new data writes to Parquet while retaining existing ORC files), or would this require creating a new table with the desired properties? Any insights would be greatly appreciated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org