kevinjqliu opened a new issue, #12823:
URL: https://github.com/apache/iceberg/issues/12823

   ### Apache Iceberg version
   
   1.8.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   MoR delete with positional delete file does not properly update the 
`total-records` in Snapshot summary.
   
   This can be seen by the pyiceberg example 
[here](https://github.com/apache/iceberg-python/pull/1926/files#diff-d875bb1b02ed1d4043a6355a53cbc35ef9eb4d862e2c8bed8007642876b3fb7bR496)
 where a single row is deleted but the `total-records` remains the same. 
   
   CoW delete, where the data file is rewritten, does not have this problem and 
the `total-records` is properly decremented, as shown 
[here](https://github.com/apache/iceberg-python/pull/1926/files#diff-d875bb1b02ed1d4043a6355a53cbc35ef9eb4d862e2c8bed8007642876b3fb7bR525)
 (Although its decremented using the previously wrongly calculated 
`total-records`).  
   
   
   I think this issue has persisted for quite a while. I found both #7463 and 
#6709. 
   
   #7463 shows that the delete (`DELETE FROM default.t1 WHERE foo = 'b'`) 
produce an OVERWRITE snapshot with the following summary:
   ```
   {'spark.app.id': 'local-1682689536619', 'changed-partition-count': '1', 
'added-position-deletes': '1', 'total-equality-deletes': '0', 
'total-position-deletes': '1', 'added-position-delete-files': '1', 
'added-files-size': '1490', 'total-delete-files': '1', 'added-delete-files': 
'1', 'total-files-size': '2387', 'total-records': '3', 'total-data-files': '1'}
   ```
   where `'total-records': '3',` is the same as the previous Snapshot even 
though a row has been deleted 
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to