fengjiajie commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1769826927
> > * It seems that ORC is not experiencing this issue because it creates value reader based on the iceberg column types. > > * Avro reads the fields entirely based on the file type, which seems to be problematic. However, it doesn't have significant issues under Parquet because Avro natively supports STRING and BYTES types, whereas Parquet only has the Binary type (whether the field is a String is determined by additional annotations or external metadata). > > Thanks for the explanation! > > > * The data type read should be consistent with the iceberg column type, so I think Spark should also incorporate this modification. > > How hard would it be to incorporate this to the Spark reader as well? I am uncomfortable with these kind of fixes which are applied only to one of the engines. If it is not too complicated we should add it here, if not, then we need to create a different PR. > > > * Additionally, Iceberg has a UUID type, which seems to be supported in Spark but not in Flink: [Spark 3.3: Add read and write support for UUIDs #7496](https://github.com/apache/iceberg/pull/7496) > > I think this is a bigger nut to crack. Probably worth another PR in Flink to fix this. I made modifications on Spark 3.5. Before the changes, the following exception would occur: ``` [B cannot be cast to org.apache.spark.unsafe.types.UTF8String java.lang.ClassCastException: [B cannot be cast to org.apache.spark.unsafe.types.UTF8String at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:45) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:45) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:165) at org.apache.spark.sql.catalyst.InternalRow.getString(InternalRow.scala:35) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org