kevinjqliu commented on PR #401:
URL: https://github.com/apache/iceberg-go/pull/401#issuecomment-2867338414

   > currently, the snapshots field of the produced metadata file looks ok 
(expired snapshots are not there), but the snapshot-log field still contains 
entries for every operation since table creation. Is is expected ? I don't find 
an appropriate answer in the spec.
   
   the `snapshot-log` should be cleaned up as part of expiration. From 
https://iceberg.apache.org/spec/#table-metadata-fields, in the `snapshot-log` 
entry of the table. 
   """
   A list (optional) of timestamp and snapshot ID pairs that encodes changes to 
the current snapshot for the table. Each time the current-snapshot-id is 
changed, a new entry should be added with the last-updated-ms and the new 
current-snapshot-id. **When snapshots are expired from the list of valid 
snapshots, all entries before a snapshot that has expired should be removed.**
   """
   
   > should we handle expired data and metadata files deletion here ?
   
   From the 
[ExpireSnapshots](https://iceberg.apache.org/javadoc/1.9.0/org/apache/iceberg/ExpireSnapshots.html)
 javadoc, it looks you can optionally remove the data and metadata files, but 
its [set to false by 
default](https://github.com/apache/iceberg/blob/a5bcacd979dc9ac70be3d7e5b93bb967ff04f71a/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L72)
   
   > the produced metadata file does not contains any new snapshot (not 
snapshot is created for the expire snapshots operation). I cannot find out what 
the spec says about it. Most probably a new snapshot must be created.
   
   Also from the [java docs for 
`ExpireSnapshots`](https://iceberg.apache.org/javadoc/1.9.0/org/apache/iceberg/ExpireSnapshots.html),
 
   """
   This API accumulates snapshot deletions and commits the new list to the 
table. This API does not allow deleting the current snapshot.
   
   **When committing, these changes will be applied to the latest table 
metadata**. Commit conflicts will be resolved by applying the changes to the 
new latest metadata and reattempting the commit.
   """
   
   A new snapshot should be created reflecting the newest table update, i.e. 
old snapshots removed 
   
   
   Hope this helps! Happy to help point to any other questions :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to