Re: [PR] feat!(datafusion): enable parallel file scanning with eager task bucketing [iceberg-rust]

via GitHub Tue, 16 Jun 2026 06:41:08 -0700


toutane commented on PR #2298:
URL: https://github.com/apache/iceberg-rust/pull/2298#issuecomment-4719387277


   > Thanks for tackling this @toutane! First round of feedback.
   
   @mbutrovich thank you very much for this initial review and for your clear 
and insightful comments, which are essential for determining the direction of 
the implementation.
   
   I’ve tried to address each of your comments. Unfortunately, the PR has 
doubled in size due to the addition of tests, the creation of a builder for the 
scan node, the refactoring of the bucketing.rs module, etc. Is this a blocker? 
If that's the case, one option might be to split the PR into two parts: the 
first part on task pre-planning and the second on leveraging the physical 
partitioning of the Iceberg table (e.g setting DataFusion's output-partitioning 
to `Hash`). What do you think?
   
   Also, regarding your comment about adding a cache for tasks to avoid 
re-reading the metadata files if `scan()` is called multiple times, I 
completely agree that this optimization is important, but I think the PR is 
currently large enough that it justifies adding it at a later stage. I can open 
a tracking issue if you’re okay with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat!(datafusion): enable parallel file scanning with eager task bucketing [iceberg-rust]

Reply via email to