Re: [PR] Fix single-threaded bottleneck in parquet file stream processing [iceberg-rust]

via GitHub Fri, 26 Sep 2025 03:46:57 -0700


vustef commented on PR #1684:
URL: https://github.com/apache/iceberg-rust/pull/1684#issuecomment-3338088749

> Thanks @vustef for this pr, the reason currently we keep reading to arrow
method simple is that parallel scan depends on things like an neutral runtime,
memory management, etc. This is beyond the scope of this core crate.If you want
to read iceberg table locally in parallel, we recommend you to use datafusion
integration.

Thanks @liurenjie1024. I'm happy to use drop the PR and use anything else.
Given that the datafusion integration still uses `to_arrow` method (ref
[here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L141)
and
[here](https://github.com/apache/iceberg-rust/blob/ba487fc1521f40c57f809d37f4f939e12fd41845/crates/integrations/datafusion/src/physical_plan/scan.rs#L201)),
this tells me that perhaps there's no API low-level enough for crates outside
of the core crate to parallelize stuff. That's because the work already happens
in the core crate by the time the items are put into the stream.

Is that right? Or do you think it'd be possible to parallelize things on the
client side of the core crate?

If not, would you be willing to open up the core crate API so that the units
of parallelism can be scheduled on different threads by the users of the core
crate?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix single-threaded bottleneck in parquet file stream processing [iceberg-rust]

Reply via email to