slfan1989 commented on issue #12762: URL: https://github.com/apache/iceberg/issues/12762#issuecomment-2960989911
During the migration process, we encountered several challenges and addressed them with the following solutions: - Rollback of Migrated Hive Tables After migration, users may use the production tables in different ways. Sometimes, due to long periods without system updates, the system cannot immediately support accessing data through Iceberg tables. For example, partition access through show partitions db.table may no longer be supported and users need to switch to select * from db.table.partitions. As a result, users wanted the ability to roll back the migrated table to a Hive table. To address this, we introduced a new stored procedure, revert, which directly removes the Iceberg metadata on the HMS side. After the rollback, the user only needs to re-execute the production tasks for a specific period. For example, if Table A was migrated on June 2, 2025, and a rollback is required on June 5, 2025, after using the stored procedure for the rollback, the user would simply need to re-run the scheduling tasks for June 3 to June 5, 2025, and Table A will revert to a Hive table. - Aligning Partition Layout with Hive Table Typically, Hive table partitions are stored in the format /dbName/tableName/dt=20250610/, while Iceberg partitions follow the format /dbName/tableName/dt=20250610/data/. To ensure that the migrated Iceberg tables retain the original partition layout, we added a configuration option that allows Iceberg tables to continue using the original Hive partition layout after migration. - Limited Support for Using Iceberg Tables with Spark 2 The community's support for Iceberg in Spark 3 is very robust, and we have migrated 80% of our tasks to Spark 3. However, there are still some tasks that depend on Spark 2 due to business constraints, and migrating these tasks is difficult. Despite this, users still wish to use Iceberg tables with Spark 2. In such cases, we agreed with users to only support reading Iceberg tables in Spark 2, while also considering partition pushdown scenarios. We have implemented basic partition pushdown capabilities, and while this code may have limited impact on the Iceberg community, I plan to release it in a personal repository and provide a link for users with similar needs to refer to. - Supporting More Partition Pushdown Scenarios Some of our tables have multi-level partitions, such as year, month, and day. Writing complex partition filter conditions for these types of tables can be difficult. For example, users might need to filter data between 2023-01-01 and 2025-12-31, which results in a cumbersome partition expression. To solve this, we introduced a syntax similar to concat(year, month, day) >= '2023-01-01' and concat(year, month, day) <= '2025-12-31'. We defined a multi-level partition pushdown filtering expression in Iceberg and converted Spark's filters into partition pushdown expressions, which simplified the user's queries. These optimizations have effectively addressed the challenges encountered during the migration process, ensuring a smooth transition to Iceberg for users while enhancing the flexibility and efficiency of the system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org