[GitHub] [iceberg] Xiangakun opened a new issue, #6891: Rewrite manifest action can only split large manifest file into two manifests，instead of expected target size

via GitHub Mon, 20 Feb 2023 22:11:57 -0800


Xiangakun opened a new issue, #6891:
URL: https://github.com/apache/iceberg/issues/6891


   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   In our production scene，we met a large manifest file which contained only 
one partition data  but with more than 200M in size. After run rewrite manifest 
action (with targetManifestSizeBytes 8M ) with spark, two new manifests were 
written with more than 100M instead of expected 8M. (When planning files, it 
deeply affected the performance when iterate the large manifest file.)
   
   After looking into the implements of `BaseRewriteManifestsSparkAction` ,  
for partitioned table,  `repartitionByRange` only using partitionColumn. Here, 
I think `repartitionByRange` using both partitionColumn and file_path could 
solve the problem.
   
   Hope to receive any advice~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Xiangakun opened a new issue, #6891: Rewrite manifest action can only split large manifest file into two manifests，instead of expected target size

Reply via email to