slfan1989 commented on code in PR #13907:
URL: https://github.com/apache/iceberg/pull/13907#discussion_r2317259985
##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java:
##########
@@ -133,7 +133,47 @@ public void removeTables() {
sql("DROP TABLE IF EXISTS %s", tableName(OTHER_TABLE_NAME));
}
- // TODO: add tests for truncate transforms once SPARK-40295 is released
+ // TODO: SPJ does not currently leverage truncate(...) partition transforms
for partition
+ // alignment.
+ // SPARK-40295 improved related areas, but full truncate support is tracked
in SPARK-50593.
+ // This test documents current behavior; update/extend once SPARK-50593
lands.
+ @TestTemplate
+ public void testJoinWithTruncatePartitioning() {
+ sql(
+ "CREATE TABLE %s (id BIGINT, int_col INT, dep STRING) "
+ + "USING iceberg "
+ + "PARTITIONED BY (truncate(4, dep)) "
+ + "TBLPROPERTIES (%s)",
+ tableName, tablePropsAsString(TABLE_PROPERTIES));
+
+ sql("INSERT INTO %s VALUES (1L, 100, 'software')", tableName);
+ sql("INSERT INTO %s VALUES (2L, 200, 'software')", tableName);
+ sql("INSERT INTO %s VALUES (3L, 300, 'software')", tableName);
+
+ sql(
+ "CREATE TABLE %s (id BIGINT, int_col INT, dep STRING) "
+ + "USING iceberg "
+ + "PARTITIONED BY (truncate(4, dep)) "
+ + "TBLPROPERTIES (%s)",
+ tableName(OTHER_TABLE_NAME), tablePropsAsString(TABLE_PROPERTIES));
+
+ sql("INSERT INTO %s VALUES (1L, 100, 'software')",
tableName(OTHER_TABLE_NAME));
+ sql("INSERT INTO %s VALUES (2L, 200, 'software')",
tableName(OTHER_TABLE_NAME));
+ sql("INSERT INTO %s VALUES (3L, 300, 'software')",
tableName(OTHER_TABLE_NAME));
+
+ // TODO(SPARK-50593): Once truncate transforms are leveraged by SPJ,
expected shuffles with SPJ
Review Comment:
Thank you for your feedback! I understand your point and agree to some
extent. However, I believe we should at least improve the comments to indicate
that, even though SPARK-40295 is ready, we still need to wait for SPARK-50593
before we can proceed with this TODO.
Let me briefly explain the reason for this improvement:
When I first discovered this TODO, I checked SPARK-40295 and found that it
was ready. I then tested it locally and found that, even with SPARK-40295
implemented, truncate still cannot utilize Spark's SPJ features. My initial
thought was to directly modify the TODO, but I was concerned that a review
might inquire about the issue I encountered. So, I decided to document my
verification tests here. Currently, this test can only verify that, under
SPARK-40295, truncate still does not support SPJ.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]