Fokko commented on code in PR #1879: URL: https://github.com/apache/iceberg-python/pull/1879#discussion_r2044268567
########## tests/integration/test_deletes.py: ########## @@ -467,21 +467,19 @@ def test_partitioned_table_positional_deletes_sequence_number(spark: SparkSessio assert snapshots[2].summary == Summary( Operation.OVERWRITE, Review Comment: When I change it into CoW, I get for snapshot summary 1 (the delete performend by Spark): ```json { "spark.app.id": "local-1744714815877", "added-data-files": "1", "deleted-data-files": "1", "added-records": "1", "deleted-records": "2", "added-files-size": "714", "removed-files-size": "743", "changed-partition-count": "1", "total-records": "4", "total-files-size": "1461", "total-data-files": "2", "total-delete-files": "0", "total-position-deletes": "0", "total-equality-deletes": "0", "engine-version": "3.5.1", "app-id": "local-1744714815877", "engine-name": "spark", "iceberg-version": "Apache Iceberg 1.8.0 (commit c277c2014a1b37fe755cfe37f173b6465bb8cb73)" } ``` Which seems correct: ``` (10, 100), (10, 101), <- Deleted by Spark (20, 200), (20, 201), (20, 202) ``` PyIceberg has a different approach, where this is an `Overwrite`, and first creates a snapshot that rewrites the original data file, then appends a new file with the new updated record. To reproduce this, I just removed the `TBLPROPERTIES` to set MoR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org