RussellSpitzer commented on code in PR #6371:
URL: https://github.com/apache/iceberg/pull/6371#discussion_r1042357442


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java:
##########
@@ -42,4 +42,9 @@ private SparkSQLProperties() {}
   // Controls whether to check the order of fields during writes
   public static final String CHECK_ORDERING = 
"spark.sql.iceberg.check-ordering";
   public static final boolean CHECK_ORDERING_DEFAULT = true;
+
+  // Controls whether to preserve the existing grouping of data while planning 
splits
+  public static final String PRESERVE_DATA_GROUPING =
+      "spark.sql.iceberg.split.preserve-data-grouping";
+  public static final boolean PRESERVE_DATA_GROUPING_DEFAULT = false;

Review Comment:
   My question here would be should we really have the default be false? I have 
no problem with the name but It feels like this is probably always the right 
decision for a scan with possible joins. We should probably look into this more 
in the future but my guess is that if a query requires partition columns, we 
should group by those columns.
   
   That's my long way to say i'm fine with the default being false for now, but 
I think it should probably be true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to