pvary commented on code in PR #15328:
URL: https://github.com/apache/iceberg/pull/15328#discussion_r2822960079
##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -75,10 +77,27 @@
public class SparkParquetWriters {
private SparkParquetWriters() {}
- @SuppressWarnings("unchecked")
public static <T> ParquetValueWriter<T> buildWriter(StructType dfSchema,
MessageType type) {
+ return buildWriter(null, type, dfSchema);
+ }
+
+ @SuppressWarnings("unchecked")
+ public static <T> ParquetValueWriter<T> buildWriter(
+ Schema icebergSchema, MessageType type, StructType dfSchema) {
+ return (ParquetValueWriter<T>)
+ ParquetWithSparkSchemaVisitor.visit(
+ dfSchema != null ? dfSchema :
SparkSchemaUtil.convert(icebergSchema),
+ type,
+ new WriteBuilder(type));
+ }
+
+ public static <T> ParquetValueWriter<T> buildWriter(
+ StructType dfSchema, MessageType type, Schema icebergSchema) {
return (ParquetValueWriter<T>)
- ParquetWithSparkSchemaVisitor.visit(dfSchema, type, new
WriteBuilder(type));
+ ParquetWithSparkSchemaVisitor.visit(
+ dfSchema != null ? dfSchema :
SparkSchemaUtil.convert(icebergSchema),
+ type,
+ new WriteBuilder(type));
}
Review Comment:
I’ve given this quite a bit of thought. On the caller side we use the
order`icebergSchema`, `fileSchema`, `engineSchema`, and I believe this is the
most logical ordering. If anyone feels strongly otherwise, I’m happy to adjust
it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]