kevinjqliu commented on issue #1479:
URL: 
https://github.com/apache/iceberg-python/issues/1479#issuecomment-2568436698

   > there is no noticeable time difference between single-threaded and 
multi-threaded execution. The total time is directly proportional to the number 
of manifest entries.
   
   Could you print out `ExecutorFactory.max_workers()` to double check the 
value? 
   
   > For instance, consider a scenario with 6 manifest files, each containing 
7,000 entries. With max-workers=32, the code spawns 6 threads, each completing 
after approximately 30 seconds concurrently. In contrast, with max-workers=1, 
the code processes the manifest files sequentially, yet still finishes in 
roughly 30 seconds.
   
   Theres already some discussions around this in #1229. The issue might be 
with I/O bound tasks and the python GIL. Can you give `ProcessPoolExecutor` a 
try? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to