Re: [I] Enable parallel file-level scanning for IcebergTableScan Datafusion Integration [iceberg-rust]

via GitHub Tue, 31 Mar 2026 05:10:27 -0700


toutane commented on issue #2220:
URL: https://github.com/apache/iceberg-rust/issues/2220#issuecomment-4162176372


   Hey 👋, I have a working POC for this, I drafted it in 
https://github.com/apache/iceberg-rust/pull/2298
   
   The implementation is slightly different from the approach described here: 
rather than modifying `IcebergTableScan`, it introduces a new 
`IcebergPartitionedScan` execution plan + a dedicated 
`IcebergPartitionedTableProvider`. The provider collects all `FileScanTasks` at 
plan time, then `IcebergPartitionedScan` maps one DataFusion partition per 
task, each executing independently via `ArrowReaderBuilder`. 
   
   This keeps the existing `IcebergTableScan` untouched and lets users opt into 
the parallel path explicitly by registering the partitioned provider.
   
   Let me know if this direction sounds good, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Enable parallel file-level scanning for IcebergTableScan Datafusion Integration [iceberg-rust]

Reply via email to