syun64 commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1632276771
########## pyiceberg/table/__init__.py: ########## @@ -434,6 +456,9 @@ def overwrite( if table_arrow_schema != df.schema: df = df.cast(table_arrow_schema) + with self.update_snapshot(snapshot_properties=snapshot_properties).delete() as delete_snapshot: + delete_snapshot.delete_by_predicate(overwrite_filter) + with self.update_snapshot(snapshot_properties=snapshot_properties).overwrite() as update_snapshot: # skip writing data files if the dataframe is empty Review Comment: @Fokko @kevinjqliu - I thought about this a bit more, and I think the order does matter. The reason is in how the order will interact with other metadata related features within Iceberg. The partitions metadata table is a great example. The partitions metadata table is constructed by fetching the `snapshot_id` that's associated with a specific partition. This is collected by fetching the `snapshot_id` where the datafile was added (appended). If a user time travels to this ID, if the order is `delete + append`, they will see a desired state of the table. If it is `append + delete`, we will see the state of the iceberg table in the middle state of the transaction. The order is correct in the current implementation, but just wanted to point this out for our own records -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org