liurenjie1024 commented on PR #1684: URL: https://github.com/apache/iceberg-rust/pull/1684#issuecomment-3346233140
> Is that right? Or do you think it'd be possible to parallelize things on the client side of the core crate? In fact, it not right. The desired flow is like following: 1. (core crate)`TableScan.plan_files` to split the scanning into several pieces, each `FileScanTask` contains several parts, each part is part of a large parquet data file. 2. (external engine) The external engine parallels scanning by running `FileScanTask` in parallel. For example in spark, each `FileScanTask` will be assigned to one task. 3. (core crate) The `ArrowReader` accepts one `FileScanTask` and read them into arrow data stream. This happens in core crate because some iceberg specific thing like type promotion, field match by id should be handled by iceberg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
