This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch branch-52
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/branch-52 by this push:
new e5547e2772 [branch-52] Fix duplicate group keys after hash aggregation
spill (#20724) (#20858) (#20917)
e5547e2772 is described below
commit e5547e2772fbaed693e7472f38feab690a7fe3ef
Author: Andrew Lamb <[email protected]>
AuthorDate: Fri Mar 13 14:25:33 2026 -0400
[branch-52] Fix duplicate group keys after hash aggregation spill (#20724)
(#20858) (#20917)
- Part of https://github.com/apache/datafusion/issues/20855
- Closes https://github.com/apache/datafusion/issues/20724 on branch-52
This PR:
- Backports https://github.com/apache/datafusion/pull/20858 from
@gboucher90 to the branch-52 line
Co-authored-by: gboucher90 <[email protected]>
---
datafusion/physical-plan/src/aggregates/row_hash.rs | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/datafusion/physical-plan/src/aggregates/row_hash.rs
b/datafusion/physical-plan/src/aggregates/row_hash.rs
index 7cc59b44a3..a6fc275723 100644
--- a/datafusion/physical-plan/src/aggregates/row_hash.rs
+++ b/datafusion/physical-plan/src/aggregates/row_hash.rs
@@ -1233,6 +1233,18 @@ impl GroupedHashAggregateStream {
// on the grouping columns.
self.group_ordering =
GroupOrdering::Full(GroupOrderingFull::new());
+ // Recreate group_values to use streaming mode
(GroupValuesColumn<true>
+ // with scalarized_intern) which preserves input row order, as
required
+ // by GroupOrderingFull. This is only needed for multi-column
group by,
+ // since single-column uses GroupValuesPrimitive which is always
safe.
+ let group_schema = self
+ .spill_state
+ .merging_group_by
+ .group_schema(&self.spill_state.spill_schema)?;
+ if group_schema.fields().len() > 1 {
+ self.group_values = new_group_values(group_schema,
&self.group_ordering)?;
+ }
+
// Use `OutOfMemoryMode::ReportError` from this point on
// to ensure we don't spill the spilled data to disk again.
self.oom_mode = OutOfMemoryMode::ReportError;
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]