Re: [PR] Fix null join key matching in multi-stage hash joins [pinot]

via GitHub Sun, 08 Jun 2025 06:10:13 -0700


Copilot commented on code in PR #16038:
URL: https://github.com/apache/pinot/pull/16038#discussion_r2134694099



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -139,6 +174,11 @@ private List<Object[]> 
buildJoinedDataBlockUniqueKeys(MseBlock.Data leftBlock) {
 
     for (Object[] leftRow : leftRows) {
       Object key = _leftKeySelector.getKey(leftRow);
+      // Skip rows with null join keys - they should not participate in 
equi-joins per SQL standard
+      if (isNullKey(key)) {
+        handleUnmatchedLeftRow(leftRow, rows);

Review Comment:
   The logic for handling null join keys is repeated in different join block 
methods. Consider extracting this null key handling into a separate helper 
method to reduce code duplication and improve maintainability.
   ```suggestion
         if (handleNullKey(key, leftRow, rows)) {
   ```



##########
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/HashJoinOperatorTest.java:
##########
@@ -442,6 +442,170 @@ public void 
shouldHandleJoinWithPartialResultsWhenHitDataRowsLimitOnLeftInput()
         "Max rows in join should be reached");
   }
 
+  @Test
+  public void shouldHandleHashJoinKeyCollisionLeftJoinWithNulls() {
+    // Test LEFT join with both hash collision AND null values
+    _leftInput = new BlockListMultiStageOperator.Builder(DEFAULT_CHILD_SCHEMA)
+            .addRow(1, "Aa")    // Hash collision string
+            .addRow(2, "BB")    // Hash collision string
+            .addRow(3, null)    // Null key
+            .addRow(4, "CC")    // Non-collision string
+            .buildWithEos();
+
+    _rightInput = new BlockListMultiStageOperator.Builder(DEFAULT_CHILD_SCHEMA)
+            .addRow(2, "Aa")    // Hash collision match
+            .addRow(2, "BB")    // Hash collision match
+            .addRow(3, null)    // Null key - should NOT match left null
+            .addRow(5, "DD")    // No match in left
+            .buildWithEos();
+
+    DataSchema resultSchema = new DataSchema(
+            new String[]{"int_col1", "string_col1", "int_col2", "string_col2"},
+            new ColumnDataType[]{ColumnDataType.INT, ColumnDataType.STRING, 
ColumnDataType.INT, ColumnDataType.STRING});
+
+    HashJoinOperator operator = getOperator(resultSchema, JoinRelType.LEFT, 
List.of(1), List.of(1), List.of());
+    List<Object[]> resultRows = ((MseBlock.Data) 
operator.nextBlock()).asRowHeap().getRows();
+
+    assertEquals(resultRows.size(), 4);

Review Comment:
   [nitpick] Consider switching the order of parameters in assertEquals to 
follow the expected-first convention (e.g., assertEquals(4, resultRows.size())) 
for clearer error messages during test failures.
   ```suggestion
       assertEquals(4, resultRows.size());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Fix null join key matching in multi-stage hash joins [pinot]

Reply via email to