Xiangakun opened a new issue, #6891: URL: https://github.com/apache/iceberg/issues/6891
### Apache Iceberg version 0.13.1 ### Query engine None ### Please describe the bug 🐞 In our production scene,we met a large manifest file which contained only one partition data but with more than 200M in size. After run rewrite manifest action (with targetManifestSizeBytes 8M ) with spark, two new manifests were written with more than 100M instead of expected 8M. (When planning files, it deeply affected the performance when iterate the large manifest file.) After looking into the implements of `BaseRewriteManifestsSparkAction` , for partitioned table, `repartitionByRange` only using partitionColumn. Here, I think `repartitionByRange` using both partitionColumn and file_path could solve the problem. Hope to receive any advice~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
