vustef commented on PR #1684: URL: https://github.com/apache/iceberg-rust/pull/1684#issuecomment-3338088749
> Thanks @vustef for this pr, the reason currently we keep reading to arrow method simple is that parallel scan depends on things like an neutral runtime, memory management, etc. This is beyond the scope of this core crate.If you want to read iceberg table locally in parallel, we recommend you to use datafusion integration. Thanks @liurenjie1024. I'm happy to use drop the PR and use anything else. Given that the datafusion integration still uses `to_arrow` method (ref [here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L141) and [here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L201)), this tells me that perhaps there's no API low-level enough for crates outside of the core crate to parallelize stuff. That's because the work already happens in the core crate by the time the items are put into the stream. Is that right? Or do you think it'd be possible to parallelize things on the client side of the core crate? If not, would you be willing to open up the core crate API so that the units of parallelism can be scheduled on different threads by the users of the core crate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
