Re: [I] Mismatch in Data Types Between Partial and Final Stages in Aggregations [datafusion-comet]

via GitHub Tue, 02 Jun 2026 07:46:59 -0700


andygrove commented on issue #1287:
URL: 
https://github.com/apache/datafusion-comet/issues/1287#issuecomment-4603587196


   This appears to be resolved. The codebase adopted the first remedy proposed 
here ("Always unpack string dictionaries during ScanExec"):
   
   - `ScanExec::new` now builds its schema from unpacked data types (`scan.rs`: 
*"Build schema directly from data types since get_next now always unpacks 
dictionaries"*) via `schema_from_data_types` → `unpack_dictionary_type`.
   - `ScanExec::get_next` unconditionally unpacks each array 
(`copy_or_unpack_array(..., UnpackOrClone)`).
   
   So the partial stage no longer emits dictionary types into aggregation, and 
the partial/final stages now agree on the plain (e.g. `Utf8`) type. The 
concrete reproducer (`max` on a dictionary-encoded string column) no longer 
applies. This was further hardened by #2635, which made the fuzz generator 
produce dictionary-encoded string arrays and fixed the bugs that exposed, 
giving us regression coverage for this class of mismatch.
   
   One caveat for completeness: the unpacking is top-level only 
(`copy_or_unpack_array` / `unpack_dictionary_type` match a 
`DataType::Dictionary` at the column root), so a dictionary nested inside a 
`List`/`Struct` would not be unpacked. In practice Comet's Parquet/exchange 
path dictionary-encodes at the top level, so this is theoretical. If anyone can 
produce a nested-dictionary input that reaches aggregation, that's worth a 
fresh, narrowly-scoped issue.
   
   Closing as resolved by #2635. Please reopen if the mismatch is still 
reproducible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Mismatch in Data Types Between Partial and Final Stages in Aggregations [datafusion-comet]

Reply via email to