enriquh opened a new issue, #14707: URL: https://github.com/apache/iceberg/issues/14707
### Feature Request / Improvement **Feature description** Implement schema visitor for variant datatype in parquet. This is currently not implemented on the [code](https://github.com/apache/iceberg/blob/cf27769762d678a8a790481318b6346c2dd480ff/parquet/src/main/java/org/apache/iceberg/parquet/TypeWithSchemaVisitor.java#L242) **Steps to reproduce** 1. Create a table with a variant field 2. Perform a MERGE INTO operation to update records on the target table using a variant property on the condition **Result** 3. Issue also happens when sub-variant extraction are not included in the condition. `UnsupportedOperationException: Not implemented for variant at org.apache.iceberg.parquet.TypeWithSchemaVisitor.variant(TypeWithSchemaVisitor.java:242)` > spark.sql(f""" CREATE TABLE {table_name} ( id BIGINT, variant_data VARIANT ) USING iceberg TBLPROPERTIES ('format-version' = '3') """) > merge_sql_2 = f""" > MERGE INTO {table_name} AS target > USING merge_source AS source > ON variant_get(target.variant_data, '$.name', 'string') = variant_get(source.variant_data, '$.name', 'string') > AND target.id = source.id > WHEN MATCHED THEN > UPDATE SET target.variant_data = source.variant_data > WHEN NOT MATCHED THEN > INSERT (id, variant_data) VALUES (source.id, source.variant_data) > """ **Expected results** Ability to execute MERGE operation on tables with variant fields being able to use [variant_get](https://spark.apache.org/docs/4.0.0/api/python/reference/pyspark.sql/api/pyspark.sql.functions.variant_get.html) in Spark to reduce scanned data. **Environment details** * Spark 4.0 * Iceberg 1.11 (build from main) ### Query engine Spark ### Willingness to contribute - [ ] I can contribute this improvement/feature independently - [ ] I would be willing to contribute this improvement/feature with guidance from the Iceberg community - [x] I cannot contribute this improvement/feature at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
