arunkumarucet opened a new pull request, #16704: URL: https://github.com/apache/pinot/pull/16704
### **Summary** This PR introduces a new specialized aggregation function `IntSumAggregationFunction` that provides optimized performance for `SUM` operations on `INT` columns by avoiding type conversion overhead and using native integer arithmetic. ### **Problem** The existing `SUM` aggregation function converts all values to `DOUBLE` before processing, which introduces unnecessary overhead for `INT` columns. This is particularly impactful for queries that perform multiple `SUM` operations on integer data. ### **Solution** - **New Aggregation Function**: Created `IntSumAggregationFunction` that operates directly on `INT` values using `LONG` for accumulation to prevent overflow - **Type-Specific Optimization**: Eliminates the need for `INT` → `DOUBLE` conversion, reducing CPU cycles and memory allocations - **Proper Null Handling**: Implements correct null handling semantics that return `null` when no non-null values are processed (when null handling is enabled) - **Comprehensive Testing**: Added extensive test coverage following Pinot's established testing patterns ### **Changes Made** #### **Core Implementation** - **`IntSumAggregationFunction.java`**: New specialized aggregation function for INT columns - **`AggregationFunctionType.java`**: Added `INTSUM` enum constant - **`AggregationFunctionFactory.java`**: Registered the new function type #### **Test Coverage** - **`IntSumAggregationFunctionTest.java`**: Comprehensive test suite covering null handling, group by operations, and edge cases - **`AggregationFunctionFactoryTest.java`**: Added test for proper function creation - **`AggregationFunctionTypeTest.java`**: Added test for enum recognition ### **Benefits** 1. **Performance Improvement**: Eliminates type conversion overhead for INT column aggregations 2. **Memory Efficiency**: Reduces unnecessary object allocations and type conversions 3. **Correctness**: Proper null handling that matches SQL semantics 4. **Maintainability**: Follows Pinot's established patterns and includes comprehensive test coverage ### **Usage** ```sql -- Use the new optimized function for INT columns SELECT INTSUM(ResolutionWidth) FROM hits; -- Falls back to regular SUM for non-INT columns SELECT SUM(ResolutionWidth) FROM hits; ``` ### **Testing** - All tests pass successfully - Covers null handling scenarios (enabled/disabled) - Tests group by operations (single-value and multi-value) - Verifies proper function registration and recognition - Follows existing Pinot test patterns for consistency -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
