Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

via GitHub Thu, 11 Apr 2024 08:23:34 -0700


amogh-jahagirdar commented on code in PR #10074:
URL: https://github.com/apache/iceberg/pull/10074#discussion_r1561209606



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java:
##########
@@ -21,19 +21,31 @@
 import org.apache.iceberg.spark.procedures.SparkProcedures;
 import org.apache.iceberg.spark.procedures.SparkProcedures.ProcedureBuilder;
 import org.apache.iceberg.spark.source.HasIcebergCatalog;
+import org.apache.iceberg.util.PropertyUtil;
 import org.apache.spark.sql.catalyst.analysis.NoSuchProcedureException;
 import org.apache.spark.sql.connector.catalog.Identifier;
 import org.apache.spark.sql.connector.catalog.StagingTableCatalog;
 import org.apache.spark.sql.connector.catalog.SupportsNamespaces;
 import org.apache.spark.sql.connector.iceberg.catalog.Procedure;
 import org.apache.spark.sql.connector.iceberg.catalog.ProcedureCatalog;
+import org.apache.spark.sql.util.CaseInsensitiveStringMap;
 
 abstract class BaseCatalog
     implements StagingTableCatalog,
         ProcedureCatalog,
         SupportsNamespaces,
         HasIcebergCatalog,
         SupportsFunctions {
+  /**
+   * Controls whether to mark all the fields as nullable when executing 
CREATE/REPLACE TABLE ... AS
+   * SELECT ... and creating the table. If false, fields' nullability will be 
preserved when
+   * creating the table.
+   */
+  private static final String TABLE_CREATE_NULLABLE_QUERY_SCHEMA = 
"use-nullable-query-schema";

Review Comment:
   Not super opinionated though, if there's something in Spark that's already 
following this naming then I'd agree to just follow that since it's less of a 
burden on a user to be aware of these different namings.



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java:
##########
@@ -21,19 +21,31 @@
 import org.apache.iceberg.spark.procedures.SparkProcedures;
 import org.apache.iceberg.spark.procedures.SparkProcedures.ProcedureBuilder;
 import org.apache.iceberg.spark.source.HasIcebergCatalog;
+import org.apache.iceberg.util.PropertyUtil;
 import org.apache.spark.sql.catalyst.analysis.NoSuchProcedureException;
 import org.apache.spark.sql.connector.catalog.Identifier;
 import org.apache.spark.sql.connector.catalog.StagingTableCatalog;
 import org.apache.spark.sql.connector.catalog.SupportsNamespaces;
 import org.apache.spark.sql.connector.iceberg.catalog.Procedure;
 import org.apache.spark.sql.connector.iceberg.catalog.ProcedureCatalog;
+import org.apache.spark.sql.util.CaseInsensitiveStringMap;
 
 abstract class BaseCatalog
     implements StagingTableCatalog,
         ProcedureCatalog,
         SupportsNamespaces,
         HasIcebergCatalog,
         SupportsFunctions {
+  /**
+   * Controls whether to mark all the fields as nullable when executing 
CREATE/REPLACE TABLE ... AS
+   * SELECT ... and creating the table. If false, fields' nullability will be 
preserved when
+   * creating the table.
+   */
+  private static final String TABLE_CREATE_NULLABLE_QUERY_SCHEMA = 
"use-nullable-query-schema";

Review Comment:
   Hm, if there's a pointer to where `use-nullable-query-schema` is in the 
Spark API that would be good to see (I couldn't find anything with searching). 
IMO something that mentions "preserve" would be a better verb rather than "use" 
as well as something that clarifies this applies for CTAS/RTAS, since it's a 
bit more clear that we are essentially preserving the nullability from the 
source query. So something like `preserve-ctas-rtas-nullability` feels a bit 
more direct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

Reply via email to