Sl1mb0 opened a new issue, #816:
URL: https://github.com/apache/iceberg-rust/issues/816

   In the [Iceberg specification](https://iceberg.apache.org/spec) it is 
implied that a `ManifestList` `A` and a`ManifestList` `B` may contain similar 
entries. Note that in the following diagram the (from left) first and second 
`ManifestList` each point to the first `ManifestFile`
   
   
![image](https://github.com/user-attachments/assets/ccfb4943-04a1-4167-a96b-9b05e1999021)
   
   This implies that the first `ManifestFile` would have the same snapshot ID 
as `s0` - since that would by definition be when it was created. Snapshot `s1` 
points to a `ManifestList` that points to this `ManifestFile` - meaning that 
snapshot `s1` contains a `ManifestFile` that has the same snapshot ID as `s0`.
   
   I ran into this scenario when trying to commit multiple snapshots for a 
table without any changes to the table between snapshots. Committing multiple 
snapshots means creating a `ManifestList` for each snapshot (since `snapshot -> 
manifest_list` is `1:1`) - but it does **not** mean there is a need to create a 
`ManifestFile` each time you take a snapshot; especially if no data has been 
added to the table. If we are committing a snapshot when there has been no new 
data to add to the table, you _should_ be able to 're-use' the `ManifestFile`s 
that have already been created.
   
   Specifically the code here:
   
https://github.com/apache/iceberg-rust/blob/f9de01b0584d3cd2e894049987f4dc9fd74a8de4/crates/iceberg/src/spec/manifest_list.rs#L188
   
   and here:
   
https://github.com/apache/iceberg-rust/blob/f9de01b0584d3cd2e894049987f4dc9fd74a8de4/crates/iceberg/src/spec/manifest_list.rs#L200
   
   Are where the `iceberg-rust` crate requires that a `ManifestFile`'s snapshot 
ID match the snapshot ID of the `ManifestList` that points to it. According to 
the Iceberg specification; this is wrong.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to