tomtongue commented on code in PR #8931: URL: https://github.com/apache/iceberg/pull/8931#discussion_r1386791377
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java: ########## @@ -108,6 +109,23 @@ public MigrateTableSparkAction backupTableName(String tableName) { return this; } + @Override + public MigrateTableSparkAction destCatalogName(String catalogName) { + CatalogManager catalogManager = spark().sessionState().catalogManager(); + + CatalogPlugin catalogPlugin; + if (catalogManager.isCatalogRegistered(catalogName)) { + catalogPlugin = catalogManager.catalog(catalogName); + } else { + LOG.warn( + "{} doesn't exist in SparkSession. " + "Fallback to current SparkSession catalog.", + catalogName); + catalogPlugin = catalogManager.currentCatalog(); + } + this.destCatalog = checkDestinationCatalog(catalogPlugin); Review Comment: Thanks for the review, @singhpk234. Yes, as you're saying, the Iceberg GlueCatalogImpl replicates the "partial" metadata in the rename. So if the source Spark/Hive table is partitioned, the restore process will fail as follows: ``` 23/11/06 09:54:03 INFO MigrateTableSparkAction: Generating Iceberg metadata for db.tbl in s3://bucket/path/tbl/metadata 23/11/06 09:54:03 WARN BaseCatalogToHiveConverter: Hive Exception type not found for AccessDeniedException 23/11/06 09:54:05 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms. 23/11/06 09:54:06 INFO CodeGenerator: Code generated in 230.388332 ms 23/11/06 09:54:06 INFO CodeGenerator: Code generated in 17.169875 ms 23/11/06 09:54:06 INFO CodeGenerator: Code generated in 18.598328 ms 23/11/06 09:54:07 ERROR MigrateTableSparkAction: Failed to perform the migration, aborting table creation and restoring the original table 23/11/06 09:54:07 INFO MigrateTableSparkAction: Restoring db.tbl from db.tbl_backup 23/11/06 09:54:08 INFO GlueCatalog: created rename destination table db.tbl 23/11/06 09:54:08 INFO GlueCatalog: Successfully dropped table db.tbl_backup from Glue 23/11/06 09:54:08 INFO GlueCatalog: Dropped table: db.tbl_backup 23/11/06 09:54:08 INFO GlueCatalog: Successfully renamed table from db.tbl_backup to garbagedb.iceberg_migrate_w_year_partition Exception in thread "main" org.apache.iceberg.exceptions.ValidationException: Unable to get partition spec for table: `db`.`tbl_backup` at org.apache.iceberg.spark.SparkExceptionUtil.toUncheckedException(SparkExceptionUtil.java:55) at org.apache.iceberg.spark.SparkTableUtil.importSparkTable(SparkTableUtil.java:415) at org.apache.iceberg.spark.SparkTableUtil.importSparkTable(SparkTableUtil.java:460) ... Caused by: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Table tbl_backup is not a partitioned table at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:133) at org.apache.spark.sql.hive.HiveExternalCatalog.doListPartitions(HiveExternalCatalog.scala:1308) at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1302) ... Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Table tbl_backup is not a partitioned table at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2676) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:2709) ... ``` This error was caused by the partition lost in the renamed table. So as you know, the way to resolve the migrate restriction, supporting the rename for GlueHiveMetastoreClient should be the best. At least there are people who have tried to migrate from their table into Iceberg on custom catalog like Glue Catalog. But the migrate query cannot be used because of the rename restriction. So let me consider the better way to resolve this issue. If there's no way to resolve this issue, I think we need to ask the GlueHiveMetastoreClient to support the rename operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org