b7wch opened a new issue, #12510: URL: https://github.com/apache/iceberg/issues/12510
### Query engine **Versions:** Spark Version: version 3.5.5 Iceberg Version: iceberg-spark-runtime-3.5_2.12:1.8.1 Hive Version: 4.0.0 Catalog: HMT Spark-shell Command: ``` spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1 \ --jars /Users/xxx/Downloads/hadoop-aws-3.3.4.jar,/Users/mask/Downloads/aws-java-sdk-bundle-1.12.367.jar \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.local.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.local.type=hive \ --conf spark.sql.catalog.local.uri=thrift://localhost:49083 \ --conf spark.sql.catalog.local.hadoop.fs.s3a.endpoint=http://127.0.0.1:19000 \ --conf spark.sql.catalog.local.hadoop.fs.s3a.access.key=xxx \ --conf spark.sql.catalog.local.hadoop.fs.s3a.secret.key=xxx \ --conf spark.sql.catalog.local.hadoop.fs.s3a.path.style.access=true \ --conf spark.sql.catalog.local.hadoop.fs.s3a.connection.ssl.enable=false \ --conf spark.sql.catalog.local.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:19000 \ --conf spark.hadoop.fs.s3a.access.key=xxx \ --conf spark.hadoop.fs.s3a.secret.key=xxx \ --conf spark.hadoop.fs.s3a.path.style.access=true \ --conf spark.hadoop.fs.s3a.connection.ssl.enable=false \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem ``` **Table info (from spark sql):** ``` describe formatted x; i int # Metadata Columns _spec_id int _partition struct<> _file string _pos bigint _deleted boolean # Detailed Table Information Name local.default.x Type MANAGED Location s3a://test/hive-test/warehouse/x Provider iceberg Table Properties [EXTERNAL=TRUE,bucketing_version=2,current-snapshot-id=3772775424106745344,format=iceberg/parquet,format-version=2,last_modified_by=root,last_modified_time=1741830227,serialization.format=1,storage_handler=org.apache.iceberg.mr.hive.HiveIcebergStorageHandler,write.parquet.compression-codec=zstd] ``` Step 1. In default, After insert data into table `x` using sparksql, the table catalog will be changed. (`describe formatted x; # from hive`) InputFormat-> org.apache.hadoop.mapred.FileOutputFormat OutputFormat-> org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | col_name | data_type | comment | |-------------------------------|----------------------------------------------------|----------------------------------------------------| | i | int | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | OwnerType: | USER | NULL | | Owner: | root | NULL | | CreateTime: | Wed Mar 12 07:44:04 UTC 2025 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | s3a://test/hive-test/warehouse/x | NULL | | Table Type: | EXTERNAL_TABLE | NULL | | Table Parameters: | NULL | NULL | | | DO_NOT_UPDATE_STATS | true | | | EXTERNAL | TRUE | | | bucketing_version | 2 | | | current-schema | {\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"i\",\"required\":false,\"type\":\"int\"}]} | | | current-snapshot-id | 2295160577663271801 | | | current-snapshot-summary | {\"spark.app.id\":\"local-1741830134636\",\"added-data-files\":\"1\",\"added-records\":\"1\",\"added-files-size\":\"404\",\"changed-partition-count\":\"1\",\"total-records\":\"3\",\"total-files-size\":\"1212\",\"total-data-files\":\"3\",\"total-delete-files\":\"0\",\"total-position-deletes\":\"0\",\"total-equality-deletes\":\"0\",\"engine-version\":\"3.5.5\",\"app-id\":\"local-1741830134636\",\"engine-name\":\"spark\",\"iceberg-version\":\"Apache Iceberg 1.8.1 (commit 9ce0fcf0af7becf25ad9fc996c3bad2afdcfd33d)\"} | | | current-snapshot-timestamp-ms | 1741831197720 | | | iceberg.orc.files.only | false | | | last_modified_by | root | | | last_modified_time | 1741830227 | | | metadata_location | s3a://test/hive-test/warehouse/x/metadata/00005-81b843c4-7c01-47e3-9264-5dfe1d0c38f5.metadata.json | | | numFiles | 3 | | | numRows | 3 | | | parquet.compression | zstd | | | previous_metadata_location | s3a://test/hive-test/warehouse/x/metadata/00004-35e20a60-b52c-40be-ac45-3d6a8acd641f.metadata.json | | | rawDataSize | 0 | | | serialization.format | 1 | | | snapshot-count | 3 | | | table_type | ICEBERG | | | totalSize | 1212 | | | transient_lastDdlTime | 1741830227 | | | uuid | ac28f3d4-54ae-4546-a2e3-3f0381ab3dff | | | write.parquet.compression-codec | zstd | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.FileInputFormat | NULL | | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | Step2. execute `select * from x` in hive will get error: ``` ERROR : Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:628) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:535) at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:237) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:378) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:310) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:566) ... 19 more Caused by: java.lang.RuntimeException: java.lang.InstantiationException at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:90) at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:229) ... 22 more Caused by: java.lang.InstantiationException at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:88) ... 23 more ``` Step3. execute `ALTER TABLE x SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');`. and then run `select * from x` will get: ``` INFO : Compiling command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d): select * from x INFO : No Stats for default@x, Columns: i INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:x.i, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d); Time taken: 0.094 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d): select * from x INFO : Completed executing command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d); Time taken: 0.0 seconds | x.i | |------| | 2 | | 2 | | 1 | ``` But If I insert data into x in sparksql once more, the table state will get back to step1. Anyone could give me some suggestion? ### Question **Versions:** Spark Version: version 3.5.5 Iceberg Version: iceberg-spark-runtime-3.5_2.12:1.8.1 Hive Version: 4.0.0 Catalog: HMT Spark-shell Command: ``` spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1 \ --jars /Users/xxx/Downloads/hadoop-aws-3.3.4.jar,/Users/mask/Downloads/aws-java-sdk-bundle-1.12.367.jar \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.local.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.local.type=hive \ --conf spark.sql.catalog.local.uri=thrift://localhost:49083 \ --conf spark.sql.catalog.local.hadoop.fs.s3a.endpoint=http://127.0.0.1:19000 \ --conf spark.sql.catalog.local.hadoop.fs.s3a.access.key=xxx \ --conf spark.sql.catalog.local.hadoop.fs.s3a.secret.key=xxx \ --conf spark.sql.catalog.local.hadoop.fs.s3a.path.style.access=true \ --conf spark.sql.catalog.local.hadoop.fs.s3a.connection.ssl.enable=false \ --conf spark.sql.catalog.local.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.endpoint=http://127.0.0.1:19000 \ --conf spark.hadoop.fs.s3a.access.key=xxx \ --conf spark.hadoop.fs.s3a.secret.key=xxx \ --conf spark.hadoop.fs.s3a.path.style.access=true \ --conf spark.hadoop.fs.s3a.connection.ssl.enable=false \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem ``` **Table info (from spark sql):** ``` describe formatted x; i int # Metadata Columns _spec_id int _partition struct<> _file string _pos bigint _deleted boolean # Detailed Table Information Name local.default.x Type MANAGED Location s3a://test/hive-test/warehouse/x Provider iceberg Table Properties [EXTERNAL=TRUE,bucketing_version=2,current-snapshot-id=3772775424106745344,format=iceberg/parquet,format-version=2,last_modified_by=root,last_modified_time=1741830227,serialization.format=1,storage_handler=org.apache.iceberg.mr.hive.HiveIcebergStorageHandler,write.parquet.compression-codec=zstd] ``` Step 1. In default, After insert data into table `x` using sparksql, the table catalog will be changed. (`describe formatted x; # from hive`) InputFormat-> org.apache.hadoop.mapred.FileOutputFormat OutputFormat-> org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | col_name | data_type | comment | |-------------------------------|----------------------------------------------------|----------------------------------------------------| | i | int | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | default | NULL | | OwnerType: | USER | NULL | | Owner: | root | NULL | | CreateTime: | Wed Mar 12 07:44:04 UTC 2025 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Retention: | 0 | NULL | | Location: | s3a://test/hive-test/warehouse/x | NULL | | Table Type: | EXTERNAL_TABLE | NULL | | Table Parameters: | NULL | NULL | | | DO_NOT_UPDATE_STATS | true | | | EXTERNAL | TRUE | | | bucketing_version | 2 | | | current-schema | {\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"i\",\"required\":false,\"type\":\"int\"}]} | | | current-snapshot-id | 2295160577663271801 | | | current-snapshot-summary | {\"spark.app.id\":\"local-1741830134636\",\"added-data-files\":\"1\",\"added-records\":\"1\",\"added-files-size\":\"404\",\"changed-partition-count\":\"1\",\"total-records\":\"3\",\"total-files-size\":\"1212\",\"total-data-files\":\"3\",\"total-delete-files\":\"0\",\"total-position-deletes\":\"0\",\"total-equality-deletes\":\"0\",\"engine-version\":\"3.5.5\",\"app-id\":\"local-1741830134636\",\"engine-name\":\"spark\",\"iceberg-version\":\"Apache Iceberg 1.8.1 (commit 9ce0fcf0af7becf25ad9fc996c3bad2afdcfd33d)\"} | | | current-snapshot-timestamp-ms | 1741831197720 | | | iceberg.orc.files.only | false | | | last_modified_by | root | | | last_modified_time | 1741830227 | | | metadata_location | s3a://test/hive-test/warehouse/x/metadata/00005-81b843c4-7c01-47e3-9264-5dfe1d0c38f5.metadata.json | | | numFiles | 3 | | | numRows | 3 | | | parquet.compression | zstd | | | previous_metadata_location | s3a://test/hive-test/warehouse/x/metadata/00004-35e20a60-b52c-40be-ac45-3d6a8acd641f.metadata.json | | | rawDataSize | 0 | | | serialization.format | 1 | | | snapshot-count | 3 | | | table_type | ICEBERG | | | totalSize | 1212 | | | transient_lastDdlTime | 1741830227 | | | uuid | ac28f3d4-54ae-4546-a2e3-3f0381ab3dff | | | write.parquet.compression-codec | zstd | | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.mapred.FileInputFormat | NULL | | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | 0 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | Step2. execute `select * from x` in hive will get error: ``` ERROR : Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:628) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:535) at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236) at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:237) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:378) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:310) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:566) ... 19 more Caused by: java.lang.RuntimeException: java.lang.InstantiationException at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:90) at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:229) ... 22 more Caused by: java.lang.InstantiationException at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:88) ... 23 more ``` Step3. execute `ALTER TABLE x SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');`. and then run `select * from x` will get: ``` INFO : Compiling command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d): select * from x INFO : No Stats for default@x, Columns: i INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:x.i, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d); Time taken: 0.094 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d): select * from x INFO : Completed executing command(queryId=root_20250313020615_70580b86-65cc-4814-bdbd-864d06a27c3d); Time taken: 0.0 seconds | x.i | |------| | 2 | | 2 | | 1 | ``` But If I insert data into x in sparksql once more, the table state will get back to step1. Anyone could give me some suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org