yashmayya commented on code in PR #14264: URL: https://github.com/apache/pinot/pull/14264#discussion_r1818981343
########## pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/value/FirstValueWindowFunction.java: ########## @@ -78,12 +86,78 @@ private List<Object> processRowsWindow(List<Object[]> rows) { lowerBound++; upperBound = Math.min(upperBound + 1, numRows - 1); } + + return result; + } + + private List<Object> processRowsWindowIgnoreNulls(List<Object[]> rows) { + int numRows = rows.size(); + int lowerBound = _windowFrame.getLowerBound(); + int upperBound = Math.min(_windowFrame.getUpperBound(), numRows - 1); + + int indexOfFirstNonNullValue = -1; + // Find first non-null value in the first window + for (int i = Math.max(lowerBound, 0); i <= upperBound; i++) { + Object value = extractValueFromRow(rows.get(i)); + if (value != null) { + indexOfFirstNonNullValue = i; + break; + } + } + + List<Object> result = new ArrayList<>(numRows); + + for (int i = 0; i < numRows; i++) { + if (lowerBound >= numRows) { + // Fill the remaining rows with null since all subsequent windows will be out of bounds + for (int j = i; j < numRows; j++) { + result.add(null); + } + break; + } + + if (indexOfFirstNonNullValue != -1) { + result.add(extractValueFromRow(rows.get(indexOfFirstNonNullValue))); + } else { + result.add(null); + } + + // Slide the window forward by one row; check if indexOfFirstNonNullValue is the lower bound which will not be in + // the next window. If so, find the next non-null value. + if (lowerBound >= 0 && indexOfFirstNonNullValue == lowerBound) { + indexOfFirstNonNullValue = -1; + // Find first non-null value for the next window + for (int j = lowerBound + 1; j <= Math.min(upperBound + 1, numRows - 1); j++) { + Object value = extractValueFromRow(rows.get(j)); + if (value != null) { + indexOfFirstNonNullValue = j; + break; + } + } + } + lowerBound++; Review Comment: Yep, this convention was added in https://github.com/apache/pinot/pull/14273, and was chosen since it was the least invasive given the existing framework for window functions. I've documented this in a couple of places - https://github.com/apache/pinot/blob/0f984e81d3fce0d9d5bafb950cbbcf4b4dbc6763/pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/WindowNode.java#L33-L37 https://github.com/apache/pinot/blob/0f984e81d3fce0d9d5bafb950cbbcf4b4dbc6763/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/WindowFrame.java#L31-L35 > Although it can never turn into 0 due to the fact that maxRows is bound to Integer.MAX_VALUE... don't you think it is a bit difficult to understand? The logic here doesn't take that assumption into account though, we're handling all cases of `lowerBound` / `upperBound` (-ve / +ve). This way is also convenient because we don't need to worry about handling overflows everywhere. ########## pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java: ########## @@ -326,6 +326,31 @@ public void testAggregateServerReturnFinalResult(boolean useMultiStageQueryEngin assertTrue(response.get("resultTable").get("rows").get(0).get(0).isNull()); } + @Test + public void testWindowFunctionIgnoreNulls() + throws Exception { + // Window functions are only supported in the multi-stage query engine + setUseMultiStageQueryEngine(true); + String sqlQuery = + "SELECT salary, LAST_VALUE(salary) IGNORE NULLS OVER (ORDER BY DaysSinceEpoch) AS gapfilledSalary from " Review Comment: Yes, actually this does look like a legitimate bug in Calcite on taking a second look. The issue is that the inferred return type for the window function in the parsed `SqlNode` is `INTEGER NOT NULL` when `IGNORE NULLS` option is used (assuming the column input to `FIRST_VALUE` / `LAST_VALUE` is `INTEGER`) and the converted return type is `INTEGER` (nullable) because offset based window frame bounds means that the result can be `null` when the window frame is out of bounds - which can't be the case when using `RANGE` window frames or `ROWS` window frames with `UNBOUNDED PRECEDING` / `UNBOUNDED FOLLOWING` / `CURRENT ROW`. When `IGNORE NULLS` option is not provided, the inferred return type for the window function in the parsed `SqlNode` is also `INTEGER` (nullable) which is why the issue doesn't occur there. Same when `IGNORE NULLS` option is used but input column is nullable. Interestingly, the same error and issue also occurs when the `RESPECT NULLS` option is explicitly provided. I'll create a bug tracking Jira in the Calcite project and link it here. ########## pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/value/FirstValueWindowFunction.java: ########## @@ -113,12 +187,115 @@ private List<Object> processRangeWindow(List<Object[]> rows) { return result; } - private List<Object> processUnboundedPreceding(List<Object[]> rows) { + private List<Object> processRangeWindowIgnoreNulls(List<Object[]> rows) { int numRows = rows.size(); - assert numRows > 0; - Object value = extractValueFromRow(rows.get(0)); - Object[] result = new Object[numRows]; - Arrays.fill(result, value); - return Arrays.asList(result); + + if (_windowFrame.isUnboundedPreceding() && _windowFrame.isUnboundedFollowing()) { + // Find the first non-null value and fill it in all rows + for (int i = 0; i < numRows; i++) { + Object[] row = rows.get(i); + Object value = extractValueFromRow(row); + if (value != null) { + return fillAllWithValue(rows, value); + } + } + // There's no non-null value + return Collections.nCopies(numRows, null); + } + + if (_windowFrame.isUnboundedPreceding() && _windowFrame.isUpperBoundCurrentRow()) { + // Find the first non-null value and fill it in all rows starting from the first row of the peer group of the row + // with the first non-null value + int firstNonNullValueIndex = -1; + Key firstNonNullValueKey = null; + for (int i = 0; i < numRows; i++) { + Object[] row = rows.get(i); + Object value = extractValueFromRow(row); + if (value != null) { + firstNonNullValueIndex = i; + firstNonNullValueKey = AggregationUtils.extractRowKey(row, _orderKeys); + break; + } + } + + // No non-null values + if (firstNonNullValueIndex == -1) { + return Collections.nCopies(numRows, null); + } + + List<Object> result = new ArrayList<>(numRows); + // Find the start of the peer group of the row with the first non-null value + int i; + for (i = 0; i < numRows; i++) { + Object[] row = rows.get(i); + Key orderKey = AggregationUtils.extractRowKey(row, _orderKeys); + if (orderKey.equals(firstNonNullValueKey)) { + break; + } else { + result.add(null); + } + } + + Object firstNonNullValue = extractValueFromRow(rows.get(firstNonNullValueIndex)); + for (; i < numRows; i++) { + result.add(firstNonNullValue); + } + + return result; Review Comment: Nice suggestion! I couldn't find any such off the shelf implementation either so I've created a small new class called `DualValueList` and added it to `pinot-common` in case we find use cases elsewhere in the codebase. ########## pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java: ########## @@ -326,6 +326,31 @@ public void testAggregateServerReturnFinalResult(boolean useMultiStageQueryEngin assertTrue(response.get("resultTable").get("rows").get(0).get(0).isNull()); } + @Test + public void testWindowFunctionIgnoreNulls() Review Comment: I'm not sure I follow your suggestion - there's a single query here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org