toutane commented on PR #2298: URL: https://github.com/apache/iceberg-rust/pull/2298#issuecomment-4719387277
> Thanks for tackling this @toutane! First round of feedback. @mbutrovich thank you very much for this initial review and for your clear and insightful comments, which are essential for determining the direction of the implementation. I’ve tried to address each of your comments. Unfortunately, the PR has doubled in size due to the addition of tests, the creation of a builder for the scan node, the refactoring of the bucketing.rs module, etc. Is this a blocker? If that's the case, one option might be to split the PR into two parts: the first part on task pre-planning and the second on leveraging the physical partitioning of the Iceberg table (e.g setting DataFusion's output-partitioning to `Hash`). What do you think? Also, regarding your comment about adding a cache for tasks to avoid re-reading the metadata files if `scan()` is called multiple times, I completely agree that this optimization is important, but I think the PR is currently large enough that it justifies adding it at a later stage. I can open a tracking issue if you’re okay with that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
