mbutrovich commented on PR #15470: URL: https://github.com/apache/iceberg/pull/15470#issuecomment-4693017670
Thanks for the feedback @szehon-ho and @steveloughran! Reworked to the two-phase design you proposed. The position delete rewrite no longer happens inline per manifest. It's now: 1. **Enumerate** the distinct position delete files referenced by the delete manifests being rewritten (metadata-only). 2. **Rewrite** those files in a dedicated distributed phase, deduped by path, repartitioned by delete-file count. Each task measures its rewritten file's size from the writer (`writer.length()`) and emits `path -> size`; the driver collects this into a map and broadcasts it. 3. **Rewrite manifests** (metadata-only), stamping `file_size_in_bytes` from the broadcast map. Hopefully this addresses both concerns: - **Parallelism** is back to delete-file granularity rather than manifest count, so the expensive byte-rewrite fans across the cluster again. - Each delete file is rewritten **exactly once** (deduped before the phase), so there is no redundant rewrite or redundant copy-plan entry for files shared across manifests. - Sizes are measured **on the workers, from closed files**, so there is no end-of-job burst of `getLength()`/HEAD calls (@steveloughran) and no in-progress-write length race (@huaxingao's earlier point). The file is fully written and closed in its own phase before its size is read. API (no revapi diff vs `main`): - `RewriteTablePathUtil.rewriteDeleteManifest`: new overload takes the measured `Map<String, Long>` size map and is metadata-only; the 9-param overload is `@Deprecated` and delegates with an empty map. The per-task UUID staging and its Hadoop import are gone. - `rewritePositionDeleteFileReturningLength` promoted to `public` so the Spark rewrite phase can record the size; the existing `void rewritePositionDeleteFile` is unchanged. Tests: existing size assertions still pass. Added `testMultipleDistinctDeleteFileSizesAfterRewrite`, which puts two distinct delete files of different sizes in one manifest, to guard the per-path size mapping (a collapsed or mis-keyed map would stamp both entries with one size). Changes are on Spark 3.5 / 4.0 / 4.1; Flink and Delta callers use the still-present deprecated overload and are unaffected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
