Re: [I] [Consult] planTask tasks a lot of time, consult for how to accelerate this [iceberg]

via GitHub Wed, 23 Apr 2025 02:33:16 -0700


Fokko commented on issue #12845:
URL: https://github.com/apache/iceberg/issues/12845#issuecomment-2823677466


   Hey @littleDrew, as @wzx140 already indicated, Spark uses distributed 
planning. The problem is likely that you have a lot of metadata and data. This 
makes the query planning process slow. I see that you refer to the code of 
Iceberg 0.13, if you're still using that version, I would highly recommend 
updating to a more recent version. Distributed planning has been added in 1.4.0 
with Spark 3.4 and 3.5.
   
   Another thing to take into consideration is doing table maintenance, 
rewriting 
[data](https://iceberg.apache.org/docs/nightly/spark-procedures/#rewrite_data_files)
 and 
[metadata](https://iceberg.apache.org/docs/nightly/spark-procedures/#rewrite_manifests)
 can speed up queries quite a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [Consult] planTask tasks a lot of time, consult for how to accelerate this [iceberg]

Reply via email to