[I] How to insert overwrite with a single commit [iceberg]

via GitHub Tue, 13 Feb 2024 13:31:55 -0800


difin opened a new issue, #9720:
URL: https://github.com/apache/iceberg/issues/9720


   ### Query engine
   
   Apache Hive
   
   ### Question
   
   Hello Iceberg Community,
   
   Background: implementation of Iceberg compaction in Apache Hive. 
   Presently, Apache Hive has Major Query-Based Iceberg compaction which 
compacts the whole table by internally executing the command `insert overwrite 
table <TableName> select * from <TableName>;`
   Since Iceberg IOW isn't supported on a table that has had partition/schema 
evolution as it can lead to wrong results upon querying, at the commit stage 
this compaction IOW command deletes all files in the tables and adds the new 
compacted files. That creates 2 snapshots and it can lead to data correctness 
problem if a user queries the table by the id of the snapshot in which all 
files have been deleted because it can give an impression that at that point in 
time there was no data in the table.
   
   Another possibility that we considered is to use RewriteFiles API, which 
allows to delete all data and delete files and to add new compacted files in 
one commit, but with this approach it is needed to build a list of all the 
existing data and delete files to pass them to RewriteFiles API and it can be a 
problem if a table has thousands of files.
   
   Does Iceberg have API that can perform IOW with a single commit, without 
listing all the existing data/delete files like with RewriteFIles? If not, can 
you consider to implement such API?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] How to insert overwrite with a single commit [iceberg]

Reply via email to