szehon-ho commented on issue #8045: URL: https://github.com/apache/iceberg/issues/8045#issuecomment-1631934972
I think I know the issue. It is part of the code to do 'removeDanglingDeletes'. For each partition of delete files, I am trying to find 'live' data files so I can do the clean up. In this method, https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackPositionDeletesRewriter.java#L122 , I use the DeleteFile's partition data directly , to query data_files table. I thought it would work as the data_files table is using the transformed partition values, just as the DeleteFile partition data should have. But the partition data of DeleteFile is not the same type as exposed in the Spark metadata table... In particular, there's a difference of logical and real Avro types as defined in spec: https://iceberg.apache.org/spec/#avro Summary: the issue does not affect specifically partition transforms. It affects partitions that has an Avro type != logical Avro type . ie date, time, etc. Im investigating a fix involving adding a conversion from Avro type to logical type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
