Re: [PR] Add IGNORE NULLS option to FIRST_VALUE and LAST_VALUE window functions [pinot]

via GitHub Mon, 28 Oct 2024 04:17:08 -0700


gortiz commented on code in PR #14264:
URL: https://github.com/apache/pinot/pull/14264#discussion_r1818880977



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/value/FirstValueWindowFunction.java:
##########
@@ -113,12 +187,115 @@ private List<Object> processRangeWindow(List<Object[]> 
rows) {
     return result;
   }
 
-  private List<Object> processUnboundedPreceding(List<Object[]> rows) {
+  private List<Object> processRangeWindowIgnoreNulls(List<Object[]> rows) {
     int numRows = rows.size();
-    assert numRows > 0;
-    Object value = extractValueFromRow(rows.get(0));
-    Object[] result = new Object[numRows];
-    Arrays.fill(result, value);
-    return Arrays.asList(result);
+
+    if (_windowFrame.isUnboundedPreceding() && 
_windowFrame.isUnboundedFollowing()) {
+      // Find the first non-null value and fill it in all rows
+      for (int i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Object value = extractValueFromRow(row);
+        if (value != null) {
+          return fillAllWithValue(rows, value);
+        }
+      }
+      // There's no non-null value
+      return Collections.nCopies(numRows, null);
+    }
+
+    if (_windowFrame.isUnboundedPreceding() && 
_windowFrame.isUpperBoundCurrentRow()) {
+      // Find the first non-null value and fill it in all rows starting from 
the first row of the peer group of the row
+      // with the first non-null value
+      int firstNonNullValueIndex = -1;
+      Key firstNonNullValueKey = null;
+      for (int i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Object value = extractValueFromRow(row);
+        if (value != null) {
+          firstNonNullValueIndex = i;
+          firstNonNullValueKey = AggregationUtils.extractRowKey(row, 
_orderKeys);
+          break;
+        }
+      }
+
+      // No non-null values
+      if (firstNonNullValueIndex == -1) {
+        return Collections.nCopies(numRows, null);
+      }
+
+      List<Object> result = new ArrayList<>(numRows);
+      // Find the start of the peer group of the row with the first non-null 
value
+      int i;
+      for (i = 0; i < numRows; i++) {
+        Object[] row = rows.get(i);
+        Key orderKey = AggregationUtils.extractRowKey(row, _orderKeys);
+        if (orderKey.equals(firstNonNullValueKey)) {
+          break;
+        } else {
+          result.add(null);
+        }
+      }
+
+      Object firstNonNullValue = 
extractValueFromRow(rows.get(firstNonNullValueIndex));
+      for (; i < numRows; i++) {
+        result.add(firstNonNullValue);
+      }
+
+      return result;

Review Comment:
   nit: Here we could reduce allocation cost if using a custom list that could 
combine two lists. I'm shocked about the lack of a list like that in Guava, but 
shouldn't be difficult to create one. The idea would be to create two lists 
like Collections.nCopies. The first would contain just nulls and the second 
just the first non null value. Finally we wrap these two instances in a view 
whose get delegates on either the first or the second list depending on whether 
the index is greater or smaller than firstNonNullValueIndex.
   
   If this pattern is seen in more window functions, to create this combine 
list view would worth the effort



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Add IGNORE NULLS option to FIRST_VALUE and LAST_VALUE window functions [pinot]

Reply via email to