fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1769826927

   > > * It seems that ORC is not experiencing this issue because it creates 
value reader based on the iceberg column types.
   > > * Avro reads the fields entirely based on the file type, which seems to 
be problematic. However, it doesn't have significant issues under Parquet 
because Avro natively supports STRING and BYTES types, whereas Parquet only has 
the Binary type (whether the field is a String is determined by additional 
annotations or external metadata).
   > 
   > Thanks for the explanation!
   > 
   > > * The data type read should be consistent with the iceberg column type, 
so I think Spark should also incorporate this modification.
   > 
   > How hard would it be to incorporate this to the Spark reader as well? I am 
uncomfortable with these kind of fixes which are applied only to one of the 
engines. If it is not too complicated we should add it here, if not, then we 
need to create a different PR.
   > 
   > > * Additionally, Iceberg has a UUID type, which seems to be supported in 
Spark but not in Flink: [Spark 3.3: Add read and write support for UUIDs 
#7496](https://github.com/apache/iceberg/pull/7496)
   > 
   > I think this is a bigger nut to crack. Probably worth another PR in Flink 
to fix this.
   
   I made modifications on Spark 3.5. Before the changes, the following 
exception would occur:
   
   ```
   [B cannot be cast to org.apache.spark.unsafe.types.UTF8String
   java.lang.ClassCastException: [B cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:45)
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:45)
        at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:165)
        at 
org.apache.spark.sql.catalyst.InternalRow.getString(InternalRow.scala:35)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to