Copilot commented on code in PR #21984:
URL: https://github.com/apache/datafusion/pull/21984#discussion_r3176272335
##########
datafusion/spark/src/function/aggregate/collect.rs:
##########
@@ -85,7 +85,7 @@ impl AggregateUDFImpl for SparkCollectList {
let data_type = field.data_type().clone();
let ignore_nulls = true;
Ok(Box::new(NullToEmptyListAccumulator::new(
- ArrayAggAccumulator::try_new(&data_type, ignore_nulls)?,
+ ArrayAggAccumulator::try_new(field, ignore_nulls)?,
data_type,
)))
Review Comment:
`collect_list` now passes `field` into `ArrayAggAccumulator`, but metadata
can still be dropped when `NullToEmptyListAccumulator` converts a NULL result
into an empty list: it builds the empty list from `self.data_type` (a
`DataType` with no field metadata) via
`SingleRowListArrayBuilder::new(empty_array)`, so extension-type metadata won’t
round-trip for the all-NULL case. Consider storing the input `FieldRef` (or at
least the planned list inner `Field`) in `NullToEmptyListAccumulator` and using
`SingleRowListArrayBuilder::with_field(...)` (or `list_inner_field_from`) when
constructing the empty list. Also consider overriding `return_field`/updating
`state_fields` for `collect_list` to propagate inner-field metadata at planning
time (similar to `array_agg`).
##########
datafusion/spark/src/function/aggregate/collect.rs:
##########
@@ -143,7 +143,7 @@ impl AggregateUDFImpl for SparkCollectSet {
let data_type = field.data_type().clone();
let ignore_nulls = true;
Ok(Box::new(NullToEmptyListAccumulator::new(
- DistinctArrayAggAccumulator::try_new(&data_type, None,
ignore_nulls)?,
+ DistinctArrayAggAccumulator::try_new(field, None, ignore_nulls)?,
data_type,
)))
Review Comment:
Same issue as `collect_list`: when `collect_set` produces a NULL result and
`NullToEmptyListAccumulator` converts it to an empty list, the empty list is
built without access to the input `FieldRef` metadata, so extension-type
metadata is lost for the all-NULL case. Consider threading the input `FieldRef`
(or planned inner `Field`) through the wrapper and using it when building the
empty list scalar; also consider overriding `return_field`/updating
`state_fields` so planning-time types retain inner metadata.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]