kevinjqliu commented on PR #401: URL: https://github.com/apache/iceberg-go/pull/401#issuecomment-2867338414
> currently, the snapshots field of the produced metadata file looks ok (expired snapshots are not there), but the snapshot-log field still contains entries for every operation since table creation. Is is expected ? I don't find an appropriate answer in the spec. the `snapshot-log` should be cleaned up as part of expiration. From https://iceberg.apache.org/spec/#table-metadata-fields, in the `snapshot-log` entry of the table. """ A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. **When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed.** """ > should we handle expired data and metadata files deletion here ? From the [ExpireSnapshots](https://iceberg.apache.org/javadoc/1.9.0/org/apache/iceberg/ExpireSnapshots.html) javadoc, it looks you can optionally remove the data and metadata files, but its [set to false by default](https://github.com/apache/iceberg/blob/a5bcacd979dc9ac70be3d7e5b93bb967ff04f71a/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L72) > the produced metadata file does not contains any new snapshot (not snapshot is created for the expire snapshots operation). I cannot find out what the spec says about it. Most probably a new snapshot must be created. Also from the [java docs for `ExpireSnapshots`](https://iceberg.apache.org/javadoc/1.9.0/org/apache/iceberg/ExpireSnapshots.html), """ This API accumulates snapshot deletions and commits the new list to the table. This API does not allow deleting the current snapshot. **When committing, these changes will be applied to the latest table metadata**. Commit conflicts will be resolved by applying the changes to the new latest metadata and reattempting the commit. """ A new snapshot should be created reflecting the newest table update, i.e. old snapshots removed Hope this helps! Happy to help point to any other questions :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org