aokolnychyi commented on code in PR #7897:
URL: https://github.com/apache/iceberg/pull/7897#discussion_r1242867386
##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -225,6 +249,43 @@ public void testRewriteDataFilesWithZOrder() {
assertEquals("Should have expected rows", expectedRows, sql("SELECT * FROM
%s", tableName));
}
+ @Test
Review Comment:
There is a check below for the order of records. I just added a similar one
for the regular sort, so we verify the order of records is correct both in
regular sorts and in z-ordering.
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/SparkShufflingDataRewriter.java:
##########
@@ -59,7 +61,24 @@ abstract class SparkShufflingDataRewriter extends
SparkSizeBasedDataRewriter {
public static final double COMPRESSION_FACTOR_DEFAULT = 1.0;
+ /**
+ * The number of shuffle partitions to use for each output file. By default,
this file rewriter
+ * assumes each shuffle partition would become a separate output file.
Attempting to generate
+ * large output files of 512 MB and more may strain the memory resources of
the cluster as such
Review Comment:
Fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]