Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

via GitHub Wed, 22 Apr 2026 07:38:01 -0700


andygrove commented on issue #1941:
URL: 
https://github.com/apache/datafusion-comet/issues/1941#issuecomment-4297166706


   Eleven tests in `CometColumnarShuffleSuite` are currently skipped with 
`assume(!isSpark40Plus)` for this issue:
   
   - `columnar shuffle on map [bool]`
   - `columnar shuffle on map [byte]`
   - `columnar shuffle on map [short]`
   - `columnar shuffle on map [int]`
   - `columnar shuffle on map [long]`
   - `columnar shuffle on map [float]`
   - `columnar shuffle on map [double]`
   - `columnar shuffle on map [date]`
   - `columnar shuffle on map [timestamp]`
   - `columnar shuffle on map [decimal]`
   - `columnar shuffle on map [string]`
   - `columnar shuffle on map [binary]`
   
   (12 tests total — the `array element` variant passes.)
   
   On Spark 4, every failing test shows the same plan: 
`CometShuffleExchangeExec` is replaced by a plain `Exchange` with 
`hashpartitioning(mapsort(_2#N), mapsort(_3#N), ...)`. Spark 4 wraps each map 
partitioning key with `MapSort` so that equal maps normalize to the same 
canonical key order before hashing. Comet's shuffle partitioning serde does not 
recognize `MapSort`, so native shuffle is rejected for every map-keyed shuffle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

Reply via email to