sauliusvl opened a new issue, #11814: URL: https://github.com/apache/iceberg/issues/11814
### Apache Iceberg version 1.6.1 ### Query engine Spark ### Please describe the bug 🐞 We observed the following situation happen a few times now when using lock-free Hive catalog commits introduced in https://github.com/apache/iceberg/pull/6570: We run an `ALTER TABLE table SET TBLPROPERTIES ('key' = 'value')` or any other operation that results in an Iceberg commit, either Spark or any other engine. For whatever reason the connection to the Hive metastore is broken and the HMS operation fails during the first attempt: ``` WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. alter_table_with_environmentContext org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset <...> at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1693) <...> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169) <...> at org.apache.iceberg.hive.MetastoreUtil.alterTable(MetastoreUtil.java:78) at org.apache.iceberg.hive.HiveOperationsBase.lambda$persistTable$0(HiveOperationsBase.java:112) <...> at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:239) at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) <...> at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:345) <...> ``` but the operation actually succeeds and updates the metadata location, which means that when the `RetryingMetaStoreClient` attempts resubmitting the operation, it fails with: ``` MetaException(message:The table has been modified. The parameter value for key 'metadata_location' is '<new>'. The expected was value was '<previous>') ``` The Iceberg commit is then considered failed and the new metadata file is cleaned up in the `finally` block [here](https://github.com/apache/iceberg/blob/b428fbc59bd1579f4dc918a5cd48fce667d81ce1/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L320) before retrying the commit. But the problem is that the Hive table has the new metadata location set, so when Iceberg tries refreshing the table it fails, because the new metadata file no longer exists, leaving the table in a corrupted state. I suppose a fix could be checking the exception and ignoring the case when the already set location is equal to the new metadata location, but parsing the error message sounds very hacky. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org