mbutrovich commented on PR #15470:
URL: https://github.com/apache/iceberg/pull/15470#issuecomment-4693017670

   Thanks for the feedback @szehon-ho and @steveloughran! Reworked to the 
two-phase design you proposed.
   
   The position delete rewrite no longer happens inline per manifest. It's now:
   
   1. **Enumerate** the distinct position delete files referenced by the delete 
manifests being rewritten (metadata-only).
   2. **Rewrite** those files in a dedicated distributed phase, deduped by 
path, repartitioned by delete-file count. Each task measures its rewritten 
file's size from the writer (`writer.length()`) and emits `path -> size`; the 
driver collects this into a map and broadcasts it.
   3. **Rewrite manifests** (metadata-only), stamping `file_size_in_bytes` from 
the broadcast map.
   
   Hopefully this addresses both concerns:
   
   - **Parallelism** is back to delete-file granularity rather than manifest 
count, so the expensive byte-rewrite fans across the cluster again.
   - Each delete file is rewritten **exactly once** (deduped before the phase), 
so there is no redundant rewrite or redundant copy-plan entry for files shared 
across manifests.
   - Sizes are measured **on the workers, from closed files**, so there is no 
end-of-job burst of `getLength()`/HEAD calls (@steveloughran) and no 
in-progress-write length race (@huaxingao's earlier point). The file is fully 
written and closed in its own phase before its size is read.
   
   API (no revapi diff vs `main`):
   - `RewriteTablePathUtil.rewriteDeleteManifest`: new overload takes the 
measured `Map<String, Long>` size map and is metadata-only; the 9-param 
overload is `@Deprecated` and delegates with an empty map. The per-task UUID 
staging and its Hadoop import are gone.
   - `rewritePositionDeleteFileReturningLength` promoted to `public` so the 
Spark rewrite phase can record the size; the existing `void 
rewritePositionDeleteFile` is unchanged.
   
   Tests: existing size assertions still pass. Added 
`testMultipleDistinctDeleteFileSizesAfterRewrite`, which puts two distinct 
delete files of different sizes in one manifest, to guard the per-path size 
mapping (a collapsed or mis-keyed map would stamp both entries with one size). 
Changes are on Spark 3.5 / 4.0 / 4.1; Flink and Delta callers use the 
still-present deprecated overload and are unaffected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to