Noremac201 opened a new issue, #14096: URL: https://github.com/apache/iceberg/issues/14096
### Apache Iceberg version 1.9.2 ### Query engine Spark ### Please describe the bug 🐞 * Using Apache Iceberg 1.9.2 with Hive 3.1.2 Locks * Any persistence via [`persistTable`](https://github.com/apache/iceberg/blob/apache-iceberg-1.9.2/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveOperationsBase.java#L128-L129) If the Hive Metastore API alterTable call fails, due to connection lost, for example, it will retry within the same Iceberg transaction, so to speak. Depending on how long this alterTable call takes to execute on the HMS side of things, leads to potential data loss due to so called "time travel" to when the initial transaction completes. So for example: 1. `AlterTable0` starts at T(0) 2. Connection lost, RetryingHmsHandler starts exact same AlterTable call at T(1) calling this `AlterTable0Retry` 3. `AlterTable0Retry` succeeds. 4. Iceberg releases Hive lock 5. Any number of alterTable calls come in, and update the metadata location. 6. `AlterTable0` finally completes 7. Any future table `refresh()` calls revert all the way back to `AlterTable0`'s metadata_location, effectively losing data 1. This would've been prevented without using Hive Locks, as transactional table param update would've failed. 2. Is this a known issue that can be prevented via different configurations? My assumption is that without retrying HMS handler, the Iceberg txn would've been retried in another txn, waiting on the Hive lock to be opened. However, if this alter table call takes O(minutes) rather than just a couple of seconds, the same issue still arises. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [x] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
