amogh-jahagirdar commented on code in PR #10523:
URL: https://github.com/apache/iceberg/pull/10523#discussion_r1644764063
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -423,21 +423,23 @@ public void commit() {
try {
LOG.info("Committed snapshot {} ({})", newSnapshotId.get(),
getClass().getSimpleName());
- // at this point, the commit must have succeeded. after a refresh, the
snapshot is loaded by
- // id in case another commit was added between this commit and the
refresh.
- Snapshot saved = ops.refresh().snapshot(newSnapshotId.get());
- if (saved != null) {
- cleanUncommitted(Sets.newHashSet(saved.allManifests(ops.io())));
- // also clean up unused manifest lists created by multiple attempts
- for (String manifestList : manifestLists) {
- if (!saved.manifestListLocation().equals(manifestList)) {
- deleteFile(manifestList);
+ if (!canSkipCleanupAfterCommitSuccess()) {
Review Comment:
See above, I think this becomes more readable if you invert it, you can just
do `if (cleanupAfterCommit())`
##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -192,6 +192,11 @@ protected void cleanUncommitted(Set<ManifestFile>
committed) {
}
}
+ @Override
+ protected boolean canSkipCleanupAfterCommitSuccess() {
+ return commitMetrics().attempts().value() == 1;
+ }
Review Comment:
I think I'd rename this to `cleanupAfterCommit` and invert the logic to
`commitMetrics().attempts().value() > 1`. Makes it more readable imo.
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -563,6 +565,10 @@ protected boolean canInheritSnapshotId() {
return canInheritSnapshotId;
}
+ protected boolean canSkipCleanupAfterCommitSuccess() {
+ return false;
Review Comment:
See above, if we invert this to `cleanupAfterCommit`, this would return true;
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -423,21 +423,23 @@ public void commit() {
try {
LOG.info("Committed snapshot {} ({})", newSnapshotId.get(),
getClass().getSimpleName());
- // at this point, the commit must have succeeded. after a refresh, the
snapshot is loaded by
- // id in case another commit was added between this commit and the
refresh.
- Snapshot saved = ops.refresh().snapshot(newSnapshotId.get());
- if (saved != null) {
- cleanUncommitted(Sets.newHashSet(saved.allManifests(ops.io())));
- // also clean up unused manifest lists created by multiple attempts
- for (String manifestList : manifestLists) {
- if (!saved.manifestListLocation().equals(manifestList)) {
- deleteFile(manifestList);
+ if (!canSkipCleanupAfterCommitSuccess()) {
+ // at this point, the commit must have succeeded. after a refresh, the
snapshot is loaded by
+ // id in case another commit was added between this commit and the
refresh.
+ Snapshot saved = ops.refresh().snapshot(newSnapshotId.get());
Review Comment:
No it shouldn't. Typically the pattern in Iceberg, is if at some point
someone wants the latest metadata there needs to be an explicit refresh.
Nothing should rely on something being refreshed at some earlier point.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]