xiangfu0 opened a new pull request, #17153:
URL: https://github.com/apache/pinot/pull/17153
### Summary
- Introduces ArrayAggMv, an extension of ArrayAgg that aggregates
multi-value columns by flattening per-row arrays into a single resulting array.
- Supports optional distinct aggregation to deduplicate values.
### SQL usage
- Non-distinct: ArrayAggMv(columnA, 'STRING')
- Distinct: ArrayAggMv(columnA, 'STRING', true)
### Behavior
- For MV input rows like {colA: ['A', 'B']}, {colA: ['C', 'B']}:
- ArrayAggMv(colA, 'STRING') => ['A','B','C','B']
- ArrayAggMv(colA, 'STRING', true) => ['A','B','C']
- Honors null handling consistent with ArrayAgg: when enabled, nulls are
skipped; when disabled, sentinel values are included for numeric types.
### Implementation
- Added ARRAYAGGMV to AggregationFunctionType with proper Calcite
registration for (ARRAY, CHAR, BOOL) argument types.
- Implemented MV base classes per type:
- BaseArrayAggMvIntFunction, BaseArrayAggMvLongFunction,
BaseArrayAggMvFloatFunction, BaseArrayAggMvDoubleFunction,
BaseArrayAggMvStringFunction
- Implemented concrete functions (non-distinct and distinct):
- ArrayAggMvInt/Long/Float/Double/StringFunction
- ArrayAggMvDistinctInt/Long/Float/Double/StringFunction
- Wired AggregationFunctionFactory to parse and instantiate ArrayAggMv(...)
with optional isDistinct arg.
- Serialization/deserialization implemented using existing ObjectSerDeUtils
for intermediate results.
### Tests
- Unit: ArrayAggMvFunctionTest (multi-block aggregation, distinct merging,
group-by paths, ser/de).
- Queries: ArrayAggMvQueriesTest (inner-segment sizes and broker-merged
sizes for non-distinct/distinct).
- Integration: Extended ArrayTest with:
- testArrayAggMvQueries
- testArrayAggMvDistinctQueries
- testArrayAggMvGroupByQueries
- testArrayAggMvDistinctGroupByQueries
### Notes
- BOOLEAN maps to INT variant, aligning with ArrayAgg behavior.
- Supports TIMESTAMP via LONG-typed functions.
- No changes to existing ArrayAgg behavior.
### Examples
- SELECT ArrayAggMv(tags, 'STRING') FROM t
- SELECT ArrayAggMv(tags, 'STRING', true) FROM t
- SELECT ArrayAggMv(tags, 'STRING'), groupKey FROM t GROUP BY groupKey
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]