shauryachats opened a new pull request, #15096:
URL: https://github.com/apache/pinot/pull/15096

   The current GAPFILL implementation expects the time column series to always 
be the first column in any subquery executed before the GAPFILL operation, 
which is not an ideal behaviour since it breaks perfectly valid SQLs like:
   
   ```
   SELECT
     occupied,
     GapFill(time_col, '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd 
HH:mm:ss.SSS',
       '2021-11-07 4:00:00.000',
       '2021-11-07 12:00:00.000', '1:HOURS',
       FILL(occupied, 'FILL_PREVIOUS_VALUE'),
       TIMESERIESON(alias_levelId, alias_lotId)),
     alias_lotId,
     alias_levelId
   FROM
     (
       SELECT
         lastWithTime(isOccupied, eventTime, 'INT') as occupied,
         cast(lotId as varchar) as alias_lotId,
         cast(levelId as varchar) as alias_levelId,
         DATETIMECONVERT(eventTime,'1:MILLISECONDS:EPOCH',
           '1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss.SSS',
           '1:HOURS') AS time_col
       FROM
         parkingData
       WHERE
         eventTime >= 1636257600000 AND eventTime <= 1636286400000
       GROUP BY
         alias_levelId, time_col,alias_lotId
       LIMIT 200
     )
   LIMIT 200
   ```
   
   The current implementation of `findTimeBucketColumnIndex` simply looks for 
the GAPFILL expression in the relevant subquery and operates on the assumption 
that the time bucket column is positioned in the same index as GAPFILL when 
GAPFILL is stripped while execution. This assumption only works for queries 
where GAPFILL is present in the lowest subquery and is the first to be 
executed. It fails for queries where a subquery is executed before GAPFILL 
(AGGREGATE_GAPFILL and AGGREGATE_GAPFILL_AGGREGATE).
   
   This PR fixes this by recomputing the `timeBucketColumnIndex` for the 
aforementioned GAPFILL types using the broker response to identify the index of 
the time bucket column. It also patches the rest of the GAPFILL computation 
logic to remove the assumption that GAPFILL is the first column in the subquery.
   
   ### Testing
   
   Successfully tested that the fix works in TIMESTAMP quickstart. Also added 
multiple unit tests to make sure each possible flow is working correctly with 
the fix. 
   (Note: current unit tests did not have queries that trigger 
CountGapfillProcessor and SumAvgGapfillProcessor and this PR adds tests for 
those query shapes as well).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to