andygrove commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-4297166706
Eleven tests in `CometColumnarShuffleSuite` are currently skipped with `assume(!isSpark40Plus)` for this issue: - `columnar shuffle on map [bool]` - `columnar shuffle on map [byte]` - `columnar shuffle on map [short]` - `columnar shuffle on map [int]` - `columnar shuffle on map [long]` - `columnar shuffle on map [float]` - `columnar shuffle on map [double]` - `columnar shuffle on map [date]` - `columnar shuffle on map [timestamp]` - `columnar shuffle on map [decimal]` - `columnar shuffle on map [string]` - `columnar shuffle on map [binary]` (12 tests total — the `array element` variant passes.) On Spark 4, every failing test shows the same plan: `CometShuffleExchangeExec` is replaced by a plain `Exchange` with `hashpartitioning(mapsort(_2#N), mapsort(_3#N), ...)`. Spark 4 wraps each map partitioning key with `MapSort` so that equal maps normalize to the same canonical key order before hashing. Comet's shuffle partitioning serde does not recognize `MapSort`, so native shuffle is rejected for every map-keyed shuffle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
