shauryachats opened a new pull request, #15096: URL: https://github.com/apache/pinot/pull/15096
The current GAPFILL implementation expects the time column series to always be the first column in any subquery executed before the GAPFILL operation, which is not an ideal behaviour since it breaks perfectly valid SQLs like: ``` SELECT occupied, GapFill(time_col, '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '2021-11-07 4:00:00.000', '2021-11-07 12:00:00.000', '1:HOURS', FILL(occupied, 'FILL_PREVIOUS_VALUE'), TIMESERIESON(alias_levelId, alias_lotId)), alias_lotId, alias_levelId FROM ( SELECT lastWithTime(isOccupied, eventTime, 'INT') as occupied, cast(lotId as varchar) as alias_lotId, cast(levelId as varchar) as alias_levelId, DATETIMECONVERT(eventTime,'1:MILLISECONDS:EPOCH', '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS', '1:HOURS') AS time_col FROM parkingData WHERE eventTime >= 1636257600000 AND eventTime <= 1636286400000 GROUP BY alias_levelId, time_col,alias_lotId LIMIT 200 ) LIMIT 200 ``` The current implementation of `findTimeBucketColumnIndex` simply looks for the GAPFILL expression in the relevant subquery and operates on the assumption that the time bucket column is positioned in the same index as GAPFILL when GAPFILL is stripped while execution. This assumption only works for queries where GAPFILL is present in the lowest subquery and is the first to be executed. It fails for queries where a subquery is executed before GAPFILL (AGGREGATE_GAPFILL and AGGREGATE_GAPFILL_AGGREGATE). This PR fixes this by recomputing the `timeBucketColumnIndex` for the aforementioned GAPFILL types using the broker response to identify the index of the time bucket column. It also patches the rest of the GAPFILL computation logic to remove the assumption that GAPFILL is the first column in the subquery. ### Testing Successfully tested that the fix works in TIMESTAMP quickstart. Also added multiple unit tests to make sure each possible flow is working correctly with the fix. (Note: current unit tests did not have queries that trigger CountGapfillProcessor and SumAvgGapfillProcessor and this PR adds tests for those query shapes as well). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org