rdblue commented on code in PR #13445:
URL: https://github.com/apache/iceberg/pull/13445#discussion_r2289533392
##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -79,6 +84,123 @@ public static <T> ParquetValueWriter<T>
buildWriter(StructType dfSchema, Message
ParquetWithSparkSchemaVisitor.visit(dfSchema, type, new
WriteBuilder(type));
}
+ @SuppressWarnings("unchecked")
+ public static <T> ParquetValueWriter<T> buildWriter(Schema iSchema,
MessageType type) {
Review Comment:
I think there's a problem with this method because we need to know what the
actual Spark type in the row is going to be in order to call the correct getter
method on `InternalRow`.
When Spark is writing a `short`, Iceberg converts the type to an `int`. That
works fine most of the time, but some row implementations (I think `UnsafeRow`)
will fail when you call `getInt` instead of `getShort`. That's why we were
preserving the Spark type here and passing it into `ints`. By changing this to
iterate over the Iceberg type and convert, we lose the information that the
Spark representation is short and that it needs to be fetched that way. That
happens on line 288.
This is also a problem that @pvary has been hitting. I'll take a closer look
and see if we should solve this some other way to unblock both.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]