hariuserx opened a new issue, #14123:
URL: https://github.com/apache/iceberg/issues/14123

   ### Apache Iceberg version
   
   1.10.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Spark 4.0 introduced "Variant type" 
(https://www.databricks.com/blog/introducing-apache-spark-40). Iceberg 1.10 
also adds "Variant type" support. 
   
   When migrating an existing Spark table containing variant type using the 
`CALL catalog_name.system.snapshot` procedure, we get an 
`UnsupportedOperationException`. I have only checked this for Parquet.
   
   The root cause appears to be the format we get from `CatalogTable 
sourceTable = 
spark.sessionState().catalog().getTableMetadata(sourceTableIdent);` in 
`SparkTableUtil.java --> importUnpartitionedSparkTable`. With Variant type, we 
get `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` and without variant 
type we have `org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe` and 
`TableMigrationUtil.listPartition` fails to recongnize this format.
   
   
    If this is fixed, the next failure could be due to lack of `Variant` type 
in `Conversions.java --> fromPartitionString`
   
   
   **Reproduction steps:**
   
   Can be verified with a unit test in Spark 4.0 
`TestSnapshotTableProcedure.java`
   
   ```java
   @TestTemplate
     public void testSnapshot() throws IOException {
       String location = Files.createTempDirectory(temp, 
"junit").toFile().toString();
       sql(
           "CREATE TABLE %s (id bigint NOT NULL, data variant) USING parquet 
LOCATION '%s'",
           SOURCE_NAME, location);
       sql(
           "INSERT INTO TABLE %s VALUES (1, parse_json('{\"key\": 123, 
\"data\": [4, 5, \"str\"]}'))",
           SOURCE_NAME);
       sql("select * from %s ", SOURCE_NAME); // Works
       sql("select id, variant_get(data, '$.key', 'int') from %s", 
SOURCE_NAME); // Works
   
      // Fails with UnsupportedOperationException exception
       Object result =
           scalarSql(
               "CALL %s.system.snapshot('%s', '%s', properties => 
map('format-version','3'))",
               catalogName, SOURCE_NAME, tableName);
   }
   ```
   
   
   Not sure if this should be a feature request or bug.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to