Re: [PR] Add IGNORE NULLS option to FIRST_VALUE and LAST_VALUE window functions [pinot]

via GitHub Mon, 28 Oct 2024 06:25:16 -0700


yashmayya commented on code in PR #14264:
URL: https://github.com/apache/pinot/pull/14264#discussion_r1818981343



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/value/FirstValueWindowFunction.java:
##########
@@ -78,12 +86,78 @@ private List<Object> processRowsWindow(List<Object[]> rows) 
{
       lowerBound++;
       upperBound = Math.min(upperBound + 1, numRows - 1);
     }
+
+    return result;
+  }
+
+  private List<Object> processRowsWindowIgnoreNulls(List<Object[]> rows) {
+    int numRows = rows.size();
+    int lowerBound = _windowFrame.getLowerBound();
+    int upperBound = Math.min(_windowFrame.getUpperBound(), numRows - 1);
+
+    int indexOfFirstNonNullValue = -1;
+    // Find first non-null value in the first window
+    for (int i = Math.max(lowerBound, 0); i <= upperBound; i++) {
+      Object value = extractValueFromRow(rows.get(i));
+      if (value != null) {
+        indexOfFirstNonNullValue = i;
+        break;
+      }
+    }
+
+    List<Object> result = new ArrayList<>(numRows);
+
+    for (int i = 0; i < numRows; i++) {
+      if (lowerBound >= numRows) {
+        // Fill the remaining rows with null since all subsequent windows will 
be out of bounds
+        for (int j = i; j < numRows; j++) {
+          result.add(null);
+        }
+        break;
+      }
+
+      if (indexOfFirstNonNullValue != -1) {
+        result.add(extractValueFromRow(rows.get(indexOfFirstNonNullValue)));
+      } else {
+        result.add(null);
+      }
+
+      // Slide the window forward by one row; check if 
indexOfFirstNonNullValue is the lower bound which will not be in
+      // the next window. If so, find the next non-null value.
+      if (lowerBound >= 0 && indexOfFirstNonNullValue == lowerBound) {
+        indexOfFirstNonNullValue = -1;
+        // Find first non-null value for the next window
+        for (int j = lowerBound + 1; j <= Math.min(upperBound + 1, numRows - 
1); j++) {
+          Object value = extractValueFromRow(rows.get(j));
+          if (value != null) {
+            indexOfFirstNonNullValue = j;
+            break;
+          }
+        }
+      }
+      lowerBound++;

Review Comment:
   Yep, this convention was added in 
https://github.com/apache/pinot/pull/14273, and was chosen since it was the 
least invasive given the existing framework for window functions. I've 
documented this in a couple of places - 
https://github.com/apache/pinot/blob/0f984e81d3fce0d9d5bafb950cbbcf4b4dbc6763/pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/WindowNode.java#L33-L37
   
   
https://github.com/apache/pinot/blob/0f984e81d3fce0d9d5bafb950cbbcf4b4dbc6763/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/WindowFrame.java#L31-L35
   
   > Although it can never turn into 0 due to the fact that maxRows is bound to 
Integer.MAX_VALUE... don't you think it is a bit difficult to understand?
   
   The logic here doesn't take that assumption into account though, we're 
handling all cases of `lowerBound` / `upperBound` (-ve / +ve). This way is also 
convenient because we don't need to worry about handling overflows everywhere.



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java:
##########
@@ -326,6 +326,31 @@ public void testAggregateServerReturnFinalResult(boolean 
useMultiStageQueryEngin
     assertTrue(response.get("resultTable").get("rows").get(0).get(0).isNull());
   }
 
+  @Test
+  public void testWindowFunctionIgnoreNulls()
+      throws Exception {
+    // Window functions are only supported in the multi-stage query engine
+    setUseMultiStageQueryEngine(true);
+    String sqlQuery =
+        "SELECT salary, LAST_VALUE(salary) IGNORE NULLS OVER (ORDER BY 
DaysSinceEpoch) AS gapfilledSalary from "

Review Comment:
   Yes, actually this does look like a legitimate bug in Calcite on taking a 
second look. The issue is that the inferred return type for the window function 
in the parsed `SqlNode` is `INTEGER NOT NULL` when `IGNORE NULLS` option is 
used (assuming the column input to `FIRST_VALUE` / `LAST_VALUE` is `INTEGER`) 
and the converted return type is `INTEGER` (nullable) because offset based 
window frame bounds means that the result can be `null` when the window frame 
is out of bounds - which can't be the case when using `RANGE` window frames or 
`ROWS` window frames with `UNBOUNDED PRECEDING` / `UNBOUNDED FOLLOWING` / 
`CURRENT ROW`. When `IGNORE NULLS` option is not provided, the inferred return 
type for the window function in the parsed `SqlNode` is also `INTEGER` 
(nullable) which is why the issue doesn't occur there. Same when `IGNORE NULLS` 
option is used but input column is nullable. 
   
   Interestingly, the same error and issue also occurs when the `RESPECT NULLS` 
option is explicitly provided.
   
   I'll create a  bug tracking Jira in the Calcite project and link it here.



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/value/FirstValueWindowFunction.java:
##########
@@ -113,12 +187,115 @@ private List<Object> processRangeWindow(List<Object[]> 
rows) {
     return result;
   }
 
-  private List<Object> processUnboundedPreceding(List<Object[]> rows) {
+  private List<Object> processRangeWindowIgnoreNulls(List<Object[]> rows) {
     int numRows = rows.size();
-    assert numRows > 0;
-    Object value = extractValueFromRow(rows.get(0));
-    Object[] result = new Object[numRows];
-    Arrays.fill(result, value);
-    return Arrays.asList(result);
+
+    if (_windowFrame.isUnboundedPreceding() && 
_windowFrame.isUnboundedFollowing()) {
+      // Find the first non-null value and fill it in all rows
+      for (int i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Object value = extractValueFromRow(row);
+        if (value != null) {
+          return fillAllWithValue(rows, value);
+        }
+      }
+      // There's no non-null value
+      return Collections.nCopies(numRows, null);
+    }
+
+    if (_windowFrame.isUnboundedPreceding() && 
_windowFrame.isUpperBoundCurrentRow()) {
+      // Find the first non-null value and fill it in all rows starting from 
the first row of the peer group of the row
+      // with the first non-null value
+      int firstNonNullValueIndex = -1;
+      Key firstNonNullValueKey = null;
+      for (int i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Object value = extractValueFromRow(row);
+        if (value != null) {
+          firstNonNullValueIndex = i;
+          firstNonNullValueKey = AggregationUtils.extractRowKey(row, 
_orderKeys);
+          break;
+        }
+      }
+
+      // No non-null values
+      if (firstNonNullValueIndex == -1) {
+        return Collections.nCopies(numRows, null);
+      }
+
+      List<Object> result = new ArrayList<>(numRows);
+      // Find the start of the peer group of the row with the first non-null 
value
+      int i;
+      for (i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Key orderKey = AggregationUtils.extractRowKey(row, _orderKeys);
+        if (orderKey.equals(firstNonNullValueKey)) {
+          break;
+        } else {
+          result.add(null);
+        }
+      }
+
+      Object firstNonNullValue = 
extractValueFromRow(rows.get(firstNonNullValueIndex));
+      for (; i < numRows; i++) {
+        result.add(firstNonNullValue);
+      }
+
+      return result;

Review Comment:
   Nice suggestion! I couldn't find any such off the shelf implementation 
either so I've created a small new class called `DualValueList` and added it to 
`pinot-common` in case we find use cases elsewhere in the codebase.



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java:
##########
@@ -326,6 +326,31 @@ public void testAggregateServerReturnFinalResult(boolean 
useMultiStageQueryEngin
     assertTrue(response.get("resultTable").get("rows").get(0).get(0).isNull());
   }
 
+  @Test
+  public void testWindowFunctionIgnoreNulls()

Review Comment:
   I'm not sure I follow your suggestion - there's a single query here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Add IGNORE NULLS option to FIRST_VALUE and LAST_VALUE window functions [pinot]

Reply via email to