difin opened a new issue, #9720: URL: https://github.com/apache/iceberg/issues/9720
### Query engine Apache Hive ### Question Hello Iceberg Community, Background: implementation of Iceberg compaction in Apache Hive. Presently, Apache Hive has Major Query-Based Iceberg compaction which compacts the whole table by internally executing the command `insert overwrite table <TableName> select * from <TableName>;` Since Iceberg IOW isn't supported on a table that has had partition/schema evolution as it can lead to wrong results upon querying, at the commit stage this compaction IOW command deletes all files in the tables and adds the new compacted files. That creates 2 snapshots and it can lead to data correctness problem if a user queries the table by the id of the snapshot in which all files have been deleted because it can give an impression that at that point in time there was no data in the table. Another possibility that we considered is to use RewriteFiles API, which allows to delete all data and delete files and to add new compacted files in one commit, but with this approach it is needed to build a list of all the existing data and delete files to pass them to RewriteFiles API and it can be a problem if a table has thousands of files. Does Iceberg have API that can perform IOW with a single commit, without listing all the existing data/delete files like with RewriteFIles? If not, can you consider to implement such API? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org