Re: [PR] Spark: Read/Write `UnknownType` [iceberg]

via GitHub Thu, 14 Aug 2025 07:20:54 -0700


Fokko commented on code in PR #13445:
URL: https://github.com/apache/iceberg/pull/13445#discussion_r2276781094



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -63,8 +64,9 @@ private SparkParquetWriters() {}
 
   @SuppressWarnings("unchecked")
   public static <T> ParquetValueWriter<T> buildWriter(StructType dfSchema, 
MessageType type) {
+    StructType writeSchema = PruneNullType.prune(dfSchema);
     return (ParquetValueWriter<T>)
-        ParquetWithSparkSchemaVisitor.visit(dfSchema, type, new 
WriteBuilder(type));
+        ParquetWithSparkSchemaVisitor.visit(writeSchema, type, new 
WriteBuilder(type));

Review Comment:
   > That issue is a direct consequence of not representing unknown fields in 
the Parquet type. Maybe we should rethink that decision and filter the Parquet 
schema later, like when creating a Parquet file. For now, we can probably work 
around the issue by updating the visitor logic to iterate over the fields and 
account for missing `NullType`.
   
   I visited that option as well, but I think we're too far down already. The 
tests started failing because it starts to allocate Arrow buffers and it tries 
to do metrics collection.
   
   I understand the potential issue, let me try to reproduce it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Read/Write `UnknownType` [iceberg]

Reply via email to