mirageyjd opened a new issue, #8932:
URL: https://github.com/apache/iceberg/issues/8932

   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We ran `BaseRewriteManifestsSparkAction` action on a large table with 7k+ 
manifests in Spark, and it took more than an hour unexpectedly. The most 
time-consuming procedure is to validate that each manifest entry in added 
manifests has a snapshot id, which is not executed in a distributed manner. 
Without the validation, the entire action takes less than 2 minutes.
   
   I wonder whether it is necessary to validate snapshot id of each manifest 
entry in manifests written by `BaseRewriteManifestsSparkAction`. It would be 
better such validation is optional and can be skipped 
in`BaseRewriteManifestsSparkAction`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to