Re: [I] De-Duping Rows While Compacting [iceberg]

via GitHub Thu, 12 Oct 2023 15:25:22 -0700


dramaticlly commented on issue #8702:
URL: https://github.com/apache/iceberg/issues/8702#issuecomment-1760455077


   data compaction only change physical files layout but not the data visible 
to users. Consider you originally have 1000 records with 10 duplicates, after 
deduplication it would be 990 records and also file layout change, I think 
deduplication (with ability to identify the row based on primary key or unique 
row identifier) probably need its own action/procedure instead of rely on data 
compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] De-Duping Rows While Compacting [iceberg]

Reply via email to