paveon opened a new issue, #1117:
URL: https://github.com/apache/iceberg-go/issues/1117
### Feature Request / Improvement
### Apache Iceberg version
main (and v0.5.1)
### Description
`table.removeSnapshotsUpdate.PostCommit` (`table/updates.go`) walks every
expired
snapshot's manifest list and opens each referenced manifest file
individually:
```go
for _, snapId := range u.SnapshotIDs {
snap := preTable.SnapshotByID(snapId)
mans, err := snap.Manifests(prefs)
for _, man := range mans {
for entry, err := range man.Entries(prefs, false) { ... }
}
}
```
Iceberg manifests are shared by reference across snapshots — an APPEND
commit
produces a new manifest list pointing at all the existing manifests plus
1–2
new ones. So a shared manifest gets opened once per expired snapshot that
references it, instead of once total.
For a table with 491 incremental-append snapshots, expiring 490 of them
causes ~sum(1..490) = ~120k manifest-file downloads from object storage
where
~500 unique reads would suffice. We observed a single-table expire running
for hours in this state.
The retained-snapshot pass below has the same shape and could also dedupe
across retained snapshots.
Willing to contribute
- I can contribute a fix for this bug independently
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]