Xuanwo commented on PR #880: URL: https://github.com/apache/iceberg-rust/pull/880#issuecomment-2645818431
> It's our guess that this distinction might arise due to scanning primitives used. JanKaul/iceberg-rust leverages ParquetExec from DataFusion, which is at this point highly optimized, and probably benefits from a more favorable work distribution (e.g. a combination of more evenly spread record batches across different partition streams, scanning multiple ranges from same Parquet files in parallel Tokio tasks, more efficient pruning etc.) than get_batch_stream. Hi, I believe that highly possible. The existing get batch stream is designed for simple workloads and I'm guessing query engines need to build its own part distribution logic instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org