pvary commented on code in PR #13445:
URL: https://github.com/apache/iceberg/pull/13445#discussion_r2290984024


##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java:
##########
@@ -79,6 +84,123 @@ public static <T> ParquetValueWriter<T> 
buildWriter(StructType dfSchema, Message
         ParquetWithSparkSchemaVisitor.visit(dfSchema, type, new 
WriteBuilder(type));
   }
 
+  @SuppressWarnings("unchecked")
+  public static <T> ParquetValueWriter<T> buildWriter(Schema iSchema, 
MessageType type) {

Review Comment:
   For the record, here is the transformer which could be applied to the 
InternalRow object to transform getInt calls to getShort: 
https://github.com/apache/iceberg/pull/12298/files#diff-d4deb437ec91a0e4570aadb6a30fe700dc9eb261e82d42bdc2e23d41dbc6777a
   
   As @rdblue mentioned somewhere in our discussions, that this solution adds 
an extra int -> Integer -> int boxing/unboxing step, and a bit of an ugly code 
by creating a few extra classes for wrapping InternalRow, ArrayData and MapData.
   
   I have run a few tests and haven't seen noticeable performance degradation, 
but if we want to use this we need more testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to