andygrove commented on code in PR #4428: URL: https://github.com/apache/datafusion-comet/pull/4428#discussion_r3335693064
########## README.md: ########## @@ -58,17 +60,22 @@ See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contribut ## What Comet Accelerates -Comet replaces Spark operators and expressions with native Rust implementations that run on Apache DataFusion. -It uses Apache Arrow for zero-copy data transfer between the JVM and native code. +Comet replaces Spark operators and expressions with implementations that consume and produce Apache Arrow +batches. Most run as native Rust code on top of Apache DataFusion; some run as JVM code over Arrow batches. +Either way the work stays in the Comet pipeline without falling back to Spark's row-based engine. - **Parquet scans**: native Parquet reader integrated with Spark's query planner - **Apache Iceberg**: accelerated Parquet scans when reading Iceberg tables from Spark (see the [Iceberg guide](https://datafusion.apache.org/comet/user-guide/iceberg.html)) -- **Shuffle**: native columnar shuffle with support for hash and range partitioning +- **Shuffle**: Arrow-IPC columnar shuffle with support for hash and range partitioning, in a native Rust Review Comment: Yes, round-robin in native is not compatible with Spark's version, so it is opt-in. For the top level README it seems like too much information to try and include -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
