vustef commented on PR #1684:
URL: https://github.com/apache/iceberg-rust/pull/1684#issuecomment-3338088749

   > Thanks @vustef for this pr, the reason currently we keep reading to arrow 
method simple is that parallel scan depends on things like an neutral runtime, 
memory management, etc. This is beyond the scope of this core crate.If you want 
to read iceberg table locally in parallel, we recommend you to use datafusion 
integration.
   
   Thanks @liurenjie1024. I'm happy to use drop the PR and use anything else. 
Given that the datafusion integration still uses `to_arrow` method (ref 
[here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L141)
 and  
[here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L201)),
 this tells me that perhaps there's no API low-level enough for crates outside 
of the core crate to parallelize stuff. That's because the work already happens 
in the core crate by the time the items are put into the stream.
   
   Is that right? Or do you think it'd be possible to parallelize things on the 
client side of the core crate?
   
   If not, would you be willing to open up the core crate API so that the units 
of parallelism can be scheduled on different threads by the users of the core 
crate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to