vaultah commented on code in PR #13720:
URL: https://github.com/apache/iceberg/pull/13720#discussion_r2293349352
##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -281,24 +281,34 @@ private String rebuildMetadata() {
// rebuild version files
RewriteResult<Snapshot> rewriteVersionResult =
rewriteVersionFiles(endMetadata);
Set<Snapshot> deltaSnapshots = deltaSnapshots(startMetadata,
rewriteVersionResult.toRewrite());
-
- Set<String> manifestsToRewrite = manifestsToRewrite(deltaSnapshots,
startMetadata);
Set<Snapshot> validSnapshots =
Sets.difference(snapshotSet(endMetadata), snapshotSet(startMetadata));
+ // rebuild manifest files
+ Set<ManifestFile> manifestsToRewrite = manifestsToRewrite(validSnapshots);
+
+ Map<String, RewriteContentFileResult> rewriteManifestResult =
+ rewriteManifests(deltaSnapshots, endMetadata, manifestsToRewrite);
+
+ // Extract manifest file sizes for manifest list rewriting
+ Map<String, Long> rewrittenManifestLengths =
+ rewriteManifestResult.entrySet().stream()
+ .collect(Collectors.toMap(Map.Entry::getKey, entry ->
entry.getValue().length()));
+
// rebuild manifest-list files
RewriteResult<ManifestFile> rewriteManifestListResult =
validSnapshots.stream()
- .map(snapshot -> rewriteManifestList(snapshot, endMetadata,
manifestsToRewrite))
+ .map(snapshot -> rewriteManifestList(snapshot, endMetadata,
rewrittenManifestLengths))
.reduce(new RewriteResult<>(), RewriteResult::append);
- // rebuild manifest files
- RewriteContentFileResult rewriteManifestResult =
- rewriteManifests(deltaSnapshots, endMetadata,
rewriteManifestListResult.toRewrite());
+ // Aggregate all manifest rewrite results
+ RewriteContentFileResult allManifestsResult =
+ rewriteManifestResult.values().stream()
+ .reduce(new RewriteContentFileResult(),
RewriteContentFileResult::append);
// rebuild position delete files
Set<DeleteFile> deleteFiles =
- rewriteManifestResult.toRewrite().stream()
+ allManifestsResult.toRewrite().stream()
Review Comment:
Thank you, you're right — this change was unnecessary. After changing the
return type of `rewriteManifests`, I had to add another variable to hold the
aggregated `RewriteContentFileResult`. I have now changed its name back to
`rewriteManifestResult` and added
```
Map<String, RewriteContentFileResult> rewriteManifestResultMap
```
to hold the result of `rewriteManifests`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]