Re: [PR] Spark 3.5: Add support for creating and altering tables with default values [iceberg]

via GitHub Tue, 20 May 2025 08:29:44 -0700


amogh-jahagirdar commented on code in PR #13107:
URL: https://github.com/apache/iceberg/pull/13107#discussion_r2098277436



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSchemaUtil.java:
##########
@@ -127,6 +137,76 @@ public static Schema convert(StructType sparkType) {
     return new Schema(converted.asNestedType().asStructType().fields());
   }
 
+  public static Schema 
convert(List<org.apache.spark.sql.connector.catalog.Column> columns) {
+    // Create a spark struct type from the list of columns and run through the 
existing conversion
+    // visitor to assign IDs
+    StructType sparkSchema = new StructType();
+    Map<String, org.apache.spark.sql.connector.catalog.Column> columnsByName = 
Maps.newHashMap();
+    for (org.apache.spark.sql.connector.catalog.Column column : columns) {
+      sparkSchema =
+          sparkSchema.add(
+              column.name(),
+              column.dataType(),
+              column.nullable(),
+              column.metadataInJSON() != null
+                  ? Metadata.fromJson(column.metadataInJSON())
+                  : Metadata.empty());
+      columnsByName.put(column.name(), column);
+    }
+
+    Schema icebergSchema = convert(sparkSchema);
+
+    // Do a second pass over the fields to set the default values
+    List<Types.NestedField> fieldsWithDefaults = Lists.newArrayList();
+    for (Types.NestedField field : icebergSchema.columns()) {
+      org.apache.spark.sql.connector.catalog.Column column = 
columnsByName.get(field.name());
+      if (column.defaultValue() == null) {
+        fieldsWithDefaults.add(field);
+      } else {
+        Literal<?> icebergDefaultValue =
+            convertSparkDefaultValueToLiteral(column.defaultValue(), 
field.type(), field.name());
+        fieldsWithDefaults.add(
+            Types.NestedField.from(field)
+                .withInitialDefault(icebergDefaultValue)
+                .withWriteDefault(icebergDefaultValue)
+                .build());
+      }
+    }
+
+    return new Schema(fieldsWithDefaults);
+  }
+
+  static Literal<?> convertSparkDefaultValueToLiteral(
+      ColumnDefaultValue defaultValue, Type type, String fieldName) {
+    if (TypeUtil.find(type, t -> t.typeId() == Type.TypeID.STRUCT) != null) {
+      throw new UnsupportedOperationException(
+          String.format("Adding struct %s with default value is not 
supported", fieldName));
+    }
+
+    org.apache.spark.sql.catalyst.expressions.Expression expression =
+        SQL_PARSER.parseExpression(defaultValue.getSql());
+
+    if (!supportedExpressionForDefaultValues(expression)) {
+      throw new UnsupportedOperationException(
+          String.format(
+              "Adding expression of type %s for default value for field %s is 
not supported",
+              expression.prettyName(), fieldName));
+    }
+
+    org.apache.spark.sql.connector.expressions.Literal<?> sparkLiteral = 
defaultValue.getValue();
+    if (sparkLiteral.value() instanceof UTF8String) {
+      return Expressions.lit(((UTF8String) sparkLiteral.value()).toString());
+    } else if (sparkLiteral.value() instanceof Decimal) {
+      return Expressions.lit(((Decimal) 
sparkLiteral.value()).toJavaBigDecimal());
+    }
+    return Expressions.lit(sparkLiteral.value());
+  }
+
+  private static boolean supportedExpressionForDefaultValues(
+      org.apache.spark.sql.catalyst.expressions.Expression expression) {
+    return expression instanceof 
org.apache.spark.sql.catalyst.expressions.Literal;

Review Comment:
   For the initial implementation `map` and `list` functions don't work but we 
can add extra logic here to support that.



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSchemaUtil.java:
##########
@@ -127,6 +137,76 @@ public static Schema convert(StructType sparkType) {
     return new Schema(converted.asNestedType().asStructType().fields());
   }
 
+  public static Schema 
convert(List<org.apache.spark.sql.connector.catalog.Column> columns) {

Review Comment:
   Since this is the Utility which converts default values when doing the 
conversion to the Iceberg schema and `convert(StructType sparkType)` does not, 
we should probably deprecate the other one so that users don't use that API and 
potentially drop their default values 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Add support for creating and altering tables with default values [iceberg]

Reply via email to