littleDrew opened a new issue, #12845: URL: https://github.com/apache/iceberg/issues/12845
### Query engine #### Here I write and read iceberg table with spark, i mainly do fo following operation - insert data with merge into SQL, here `write.merge.mode='merge-on-read'`, this operation will generate Data File and Delete File, the updated row count is almost 20,000,000 - select data with `select * from table` SQL, here cost **more than 420 seconds** in planInputPartition(mainly do planTask inner here), i add logs in planInputPartition method, and printed logs are as follows, corresponding code part is: https://github.com/apache/iceberg/blob/0.13.x/spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java#L149 ``` 2025-04-18 14:32:44, 865 |INFO | [Parallel-Tasks-Thread-1] Time cost for planInputPartitions is 421444(ms).| org.apache.iceberg.spark.source.SparkBatchScan.planInputPartitions(SparkBatchScan.java:167 ``` #### I want to consult how to accerate planTask/planInputPartition part in driver - likely, how can i use multi-thread to do parallel-task-planing, here i see this arcticle mentioned this: https://zhuanlan.zhihu.com/p/578466765 ``` ⑤[多线程](https://zhida.zhihu.com/search?content_id=216681278&content_type=Article&match_order=1&q=%E5%A4%9A%E7%BA%BF%E7%A8%8B&zhida_source=entity)Plan Task,并发或者分布式的删除文件 早期版本的Iceberg plan task都是单线程的,当表的规模特别大,文件数量特别多的时候,性能就会急剧下降,还有像删除文件时也是,我们将它们都改成了并发或者分布式的实现。 ``` - or other ways to accelerate this part ### Question How to accelerate PlanInputPartition/planTask part ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org