Re: [PR] Spark 3.5: Skip sort for incomparable data types in CreateChangelogViewProcedure [iceberg]

via GitHub Fri, 30 Aug 2024 11:02:36 -0700


dramaticlly commented on code in PR #11045:
URL: https://github.com/apache/iceberg/pull/11045#discussion_r1739209075



##########
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCreateChangelogViewProcedure.java:
##########
@@ -242,6 +249,34 @@ public void testUpdateWithIdentifierField() {
         sql("select * from %s order by _change_ordinal, id, data", viewName));
   }
 
+  @TestTemplate
+  public void testUpdateWithInComparableType() {
+    createTableWithIncomparableType();
+
+    sql("INSERT INTO %s VALUES (2, Map('b','b'), 12)", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Snapshot snap1 = table.currentSnapshot();
+
+    sql("INSERT OVERWRITE %s VALUES (3, Map('c','c'), 13), (2, Map('d','d'), 
12)", tableName);

Review Comment:
   Let me trying to think out loud
   - Given the identifier column is missing, we are basically sort all columns 
to generate changelog, here we are skip sorting on the columns which cannot be 
ordered. I dont feel like we will lose on detect the change as we did not 
filter any row out in the results
   - If identifier columns are added to the table or provided in procedure 
explicitly, [the repartition spec will only include such instead of all 
columns](https://github.com/apache/iceberg/blob/main/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/CreateChangelogViewProcedure.java#L164-L176)
 and later used for sorting 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Skip sort for incomparable data types in CreateChangelogViewProcedure [iceberg]

Reply via email to