fengjiajie commented on code in PR #8808: URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361854341
########## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java: ########## @@ -262,7 +262,11 @@ public ParquetValueReader<?> primitive( switch (primitive.getPrimitiveTypeName()) { case FIXED_LEN_BYTE_ARRAY: case BINARY: - return new ParquetValueReaders.ByteArrayReader(desc); + if (expected.typeId() == Types.StringType.get().typeId()) { + return new StringReader(desc); + } else { + return new ParquetValueReaders.ByteArrayReader(desc); + } Review Comment: > I am concerned about the backward compatibility of this change. Someone might already depend on reading them as binary, and this change would break their use-case This modification is only applicable to cases where the iceberg definition is 'string' and parquet column is 'binary'. Previously, such cases would encounter the following exception (unit test can reproduce this exception): ``` java.lang.ClassCastException: [B cannot be cast to org.apache.flink.table.data.StringData at org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169) at org.apache.iceberg.flink.data.TestFlinkParquetReader.testReadBinaryFieldAsString(TestFlinkParquetReader.java:137) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org