Re: [PR] Fix the snapshot summary of a partial overwrite [iceberg-python]

via GitHub Tue, 15 Apr 2025 04:07:31 -0700


Fokko commented on code in PR #1879:
URL: https://github.com/apache/iceberg-python/pull/1879#discussion_r2044268567



##########
tests/integration/test_deletes.py:
##########
@@ -467,21 +467,19 @@ def 
test_partitioned_table_positional_deletes_sequence_number(spark: SparkSessio
     assert snapshots[2].summary == Summary(
         Operation.OVERWRITE,

Review Comment:
   When I change it into CoW, I get for snapshot summary 1 (the delete 
performend by Spark):
   ```json
   {
       "spark.app.id": "local-1744714815877",
       "added-data-files": "1",
       "deleted-data-files": "1",
       "added-records": "1",
       "deleted-records": "2",
       "added-files-size": "714",
       "removed-files-size": "743",
       "changed-partition-count": "1",
       "total-records": "4",
       "total-files-size": "1461",
       "total-data-files": "2",
       "total-delete-files": "0",
       "total-position-deletes": "0",
       "total-equality-deletes": "0",
       "engine-version": "3.5.1",
       "app-id": "local-1744714815877",
       "engine-name": "spark",
       "iceberg-version": "Apache Iceberg 1.8.0 (commit 
c277c2014a1b37fe755cfe37f173b6465bb8cb73)"
   }
   ```
   
   Which seems correct:
   ```
   (10, 100), 
   (10, 101), <- Deleted by Spark
   
   (20, 200), 
   (20, 201),
   (20, 202)
   ```
   
   PyIceberg has a different approach, where this is an `Overwrite`, and first 
creates a snapshot that rewrites the original data file, then appends a new 
file with the new updated record.
   
   To reproduce this, I just removed the `TBLPROPERTIES` to set MoR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix the snapshot summary of a partial overwrite [iceberg-python]

Reply via email to