Re: [PR] Spark: Read/Write `UnknownType` [iceberg]

via GitHub Mon, 18 Aug 2025 14:10:50 -0700


Fokko commented on code in PR #13445:
URL: https://github.com/apache/iceberg/pull/13445#discussion_r2283477317



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -79,6 +84,123 @@ public static <T> ParquetValueWriter<T> 
buildWriter(StructType dfSchema, Message
         ParquetWithSparkSchemaVisitor.visit(dfSchema, type, new 
WriteBuilder(type));
   }
 
+  @SuppressWarnings("unchecked")
+  public static <T> ParquetValueWriter<T> buildWriter(Schema iSchema, 
MessageType type) {

Review Comment:
   > The other downside is the duplicates/similarities btw `WriteBuilder` and 
`IcebergWriteBuilder`. Maybe those two callers from the 
`SparkFileWriterFactory` can also switch to the Iceberg schema? Then we can 
remove/deprecate the other `buildWriter` method using Spark type.
   
   Just pushed a commit to fix that.
   
   > Conceptually, the existing WriteBuilder also needs to traverse the schema. 
so may not be much of a difference.
   
   The difference is that we post-order traverse the tree, and convert the tree 
to Spark at each of the nodes. Let me see if we can do some memoization



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Read/Write `UnknownType` [iceberg]

Reply via email to