andygrove opened a new pull request, #4076: URL: https://github.com/apache/datafusion-comet/pull/4076
## Which issue does this PR close? Closes #1941. ## Rationale for this change Spark 4.0 introduces `MapSort`, used for normalizing map values when they appear in shuffle hash partitioning keys, in `try_element_at`, and in other contexts where map ordering must be deterministic. Without native support, queries that touch maps in any of these positions fall back to Spark, which forces the entire enclosing operator off Comet (e.g. an entire shuffle exchange). ## What changes are included in this PR? - New native scalar function `map_sort` in `native/spark-expr/src/map_funcs/map_sort.rs` that sorts map entries by key in ascending order, registered via `comet_scalar_funcs.rs`. - Wire `MapSort` into the Spark 4.0 `CometExprShim` so the expression is converted to the new scalar function during serde. - The `columnar shuffle on map array element` test in `CometColumnarShuffleSuite` now expects shuffle fallback on Spark 4.0+: the new shuffle-key normalization wraps `mapsort` inside `transform(arr, x -> mapsort(x))`, and Comet does not currently support `ArrayTransform` with a lambda body. Answer correctness is still verified via `checkSparkAnswer`. ## How are these changes tested? - New unit tests in `native/spark-expr/src/map_funcs/map_sort.rs` cover sorting on each supported key type, null handling, and empty maps. - Existing `CometColumnarShuffleSuite` tests for map shuffle keys all pass under the Spark 4.0 profile (41/41). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
