Re: [I] De-Duping Rows While Compacting [iceberg]

via GitHub Mon, 16 Oct 2023 14:30:33 -0700


W-I-D-EE commented on issue #8702:
URL: https://github.com/apache/iceberg/issues/8702#issuecomment-1765308407


   Further to this, i have actually had a lot of trouble getting delete from or 
merge into working with removing duplicate rows. Today the only way i have been 
able to remove deuplicates its by selecting a dataset and then using the 
Dataframe.dropDuplicates function in spark. Finally using the dynamic overwrite 
to rewrite the partition.
   
   Does anyone know of a better way to do this. Everything i have done with 
merge into or delete from always results in all records being removed instead 
of just the duplicated rows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] De-Duping Rows While Compacting [iceberg]

Reply via email to