huaxingao commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2155709703
########## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ########## @@ -549,6 +555,146 @@ public void testJoinsWithMismatchingPartitionKeys() { tableName(OTHER_TABLE_NAME)); } + @TestTemplate + public void testJoinsCompatibleBucketNumbers() { + sql( + "CREATE TABLE %s (id BIGINT, int_col INT, dep STRING)" + + "USING iceberg " + + "PARTITIONED BY (bucket(4, id))" + + "TBLPROPERTIES (%s)", + tableName, tablePropsAsString(TABLE_PROPERTIES)); + + sql( + "INSERT INTO %s VALUES " + + "(1L, 100, 'software')," + + "(2L, 101, 'hr')," + + "(3L, 102, 'operation')," + + "(4L, 103, 'sales')," + + "(5L, 104, 'marketing')," + + "(6L, 105, 'pr')", + tableName); + + sql( + "CREATE TABLE %s (id BIGINT, int_col INT, dep STRING)" + + "USING iceberg " + + "PARTITIONED BY (bucket(6, id))" + + "TBLPROPERTIES (%s)", + tableName(OTHER_TABLE_NAME), tablePropsAsString(TABLE_PROPERTIES)); + + sql( + "INSERT INTO %s VALUES " + + "(1L, 100, 'software')," + + "(3L, 300, 'hardware')," + + "(4L, 103, 'sales')," + + "(5L, 104, 'marketing')," + + "(6L, 105, 'pr')", + tableName(OTHER_TABLE_NAME)); + + assertPartitioningAwarePlan( + 1, /* expected num of shuffles with SPJ */ + 3, /* expected num of shuffles without SPJ */ + "SELECT * " + + "FROM %s t1 " + + "INNER JOIN %s t2 " + + "ON t1.id = t2.id " + + "ORDER BY t1.id, t1.int_col, t1.dep, t2.id, t2.int_col, t2.dep", + tableName, + tableName(OTHER_TABLE_NAME)); + } + + @TestTemplate + public void testJoinsWithEqualBucketNumbers() { Review Comment: Yeah, you're right, SPJ still happens for 4 vs. 8. I wanted to test the case where the reducer returns null on the side with 4 buckets (since gcd == thisNumBuckets). SPJ still applies overall, but only the 8-bucket side needs to be reduced. Just wanted to make sure that specific branch in the logic is covered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org