Copilot commented on code in PR #16038: URL: https://github.com/apache/pinot/pull/16038#discussion_r2134694099
########## pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java: ########## @@ -139,6 +174,11 @@ private List<Object[]> buildJoinedDataBlockUniqueKeys(MseBlock.Data leftBlock) { for (Object[] leftRow : leftRows) { Object key = _leftKeySelector.getKey(leftRow); + // Skip rows with null join keys - they should not participate in equi-joins per SQL standard + if (isNullKey(key)) { + handleUnmatchedLeftRow(leftRow, rows); Review Comment: The logic for handling null join keys is repeated in different join block methods. Consider extracting this null key handling into a separate helper method to reduce code duplication and improve maintainability. ```suggestion if (handleNullKey(key, leftRow, rows)) { ``` ########## pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/HashJoinOperatorTest.java: ########## @@ -442,6 +442,170 @@ public void shouldHandleJoinWithPartialResultsWhenHitDataRowsLimitOnLeftInput() "Max rows in join should be reached"); } + @Test + public void shouldHandleHashJoinKeyCollisionLeftJoinWithNulls() { + // Test LEFT join with both hash collision AND null values + _leftInput = new BlockListMultiStageOperator.Builder(DEFAULT_CHILD_SCHEMA) + .addRow(1, "Aa") // Hash collision string + .addRow(2, "BB") // Hash collision string + .addRow(3, null) // Null key + .addRow(4, "CC") // Non-collision string + .buildWithEos(); + + _rightInput = new BlockListMultiStageOperator.Builder(DEFAULT_CHILD_SCHEMA) + .addRow(2, "Aa") // Hash collision match + .addRow(2, "BB") // Hash collision match + .addRow(3, null) // Null key - should NOT match left null + .addRow(5, "DD") // No match in left + .buildWithEos(); + + DataSchema resultSchema = new DataSchema( + new String[]{"int_col1", "string_col1", "int_col2", "string_col2"}, + new ColumnDataType[]{ColumnDataType.INT, ColumnDataType.STRING, ColumnDataType.INT, ColumnDataType.STRING}); + + HashJoinOperator operator = getOperator(resultSchema, JoinRelType.LEFT, List.of(1), List.of(1), List.of()); + List<Object[]> resultRows = ((MseBlock.Data) operator.nextBlock()).asRowHeap().getRows(); + + assertEquals(resultRows.size(), 4); Review Comment: [nitpick] Consider switching the order of parameters in assertEquals to follow the expected-first convention (e.g., assertEquals(4, resultRows.size())) for clearer error messages during test failures. ```suggestion assertEquals(4, resultRows.size()); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org