Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

via GitHub Tue, 17 Oct 2023 03:04:40 -0700


fengjiajie commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361854341



##########
flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##########
@@ -262,7 +262,11 @@ public ParquetValueReader<?> primitive(
       switch (primitive.getPrimitiveTypeName()) {
         case FIXED_LEN_BYTE_ARRAY:
         case BINARY:
-          return new ParquetValueReaders.ByteArrayReader(desc);
+          if (expected.typeId() == Types.StringType.get().typeId()) {
+            return new StringReader(desc);
+          } else {
+            return new ParquetValueReaders.ByteArrayReader(desc);
+          }

Review Comment:
   > I am concerned about the backward compatibility of this change. Someone 
might already depend on reading them as binary, and this change would break 
their use-case
   
   This modification is only applicable to cases where the iceberg definition 
is 'string' and parquet column is 'binary'. Previously, such cases would 
encounter the following exception (unit test can reproduce this exception):
   
   ```
   java.lang.ClassCastException: [B cannot be cast to 
org.apache.flink.table.data.StringData
        at 
org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169)
        at 
org.apache.iceberg.flink.data.TestFlinkParquetReader.testReadBinaryFieldAsString(TestFlinkParquetReader.java:137)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

Reply via email to