[GitHub] [iceberg] ideal opened a new issue, #7739: NullPointerException at org.apache.iceberg.mr.hive.TezUtil$TaskAttemptWrapper when INSERT with spark-sql

via GitHub Tue, 30 May 2023 06:52:38 -0700


ideal opened a new issue, #7739:
URL: https://github.com/apache/iceberg/issues/7739


   ### Apache Iceberg version
   
   1.2.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   With Spark 3.2.4 standalone mode，and  the table:
   
   ```
   > desc extended my_test_table;
   23/05/30 20:21:47 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
does not exist
   23/05/30 20:21:47 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
does not exist
   a                       string                  a                   
   b                       string                  b                   
   c                       string                  c                   
                                                                       
   # Detailed Table Information                                                
   Database                my_test_iceberg                                
   Table                   my_test_table                            
   Owner                   root                                        
   Created Time            Tue May 30 16:28:17 CST 2023                        
   Last Access             UNKNOWN                                     
   Created By              Spark 2.2 or prior                          
   Type                    EXTERNAL                                    
   Provider                hive                                        
   Comment                 my_test_table                            
   Table Properties        
[current-schema={"type":"struct","schema-id":0,"fields":[{"id":1,"name":"a","required":false,"type":"string","doc":"a"},{"id":2,"name":"b","required":false,"type":"string","doc":"b"},{"id":3,"name":"c","required":false,"type":"string","doc":"c"}]},
 current-snapshot-id=1142796867698349657, 
current-snapshot-summary={"spark.app.id":"app-20230530113600-0013","added-data-files":"1","added-records":"1","added-files-size":"860","changed-partition-count":"1","total-records":"24","total-files-size":"20654","total-data-files":"24","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"},
 current-snapshot-timestamp-ms=1685438806641, 
default-partition-spec={"spec-id":0,"fields":[{"name":"c","transform":"identity","source-id":3,"field-id":1000}]},
 engine.hive.enabled=true, external.table.purge=TRUE, 
metadata_location=hdfs://xxxx/user/warehouse/my_test_iceberg/my_test_table/metadata/00024-665493b0-a47d-4861-8ebd-767f868f8fda.metadata.json,
 prev
 
ious_metadata_location=hdfs://xxxx/user/warehouse/my_test_iceberg/my_test_table/metadata/00023-87b8cb31-15ab-45cb-9b73-d8085549e2c1.metadata.json,
 snapshot-count=24, 
storage_handler=org.apache.iceberg.mr.hive.HiveIcebergStorageHandler, 
table_type=ICEBERG, transient_lastDdlTime=1685435297, 
uuid=5d212398-0457-4058-b400-936e0533fcd6]                          
   Statistics              20654 bytes, 24 rows                        
   Location                
hdfs://xxxx/user/warehouse/my_test_iceberg/my_test_table               
   Serde Library           org.apache.iceberg.mr.hive.HiveIcebergSerDe          
               
   InputFormat             org.apache.iceberg.mr.hive.HiveIcebergInputFormat    
                       
   OutputFormat            org.apache.iceberg.mr.hive.HiveIcebergOutputFormat   
                       
   Partition Provider      Catalog
   ```
   
   And running:
   ```
   bin/spark-sql --master spark://${spark-master}:7077 --conf 
spark.driver.host=${current-host-ip} --conf 
spark.hive.metastore.uris=thrift://${metastore-service}:9083 --conf 
spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog --conf 
spark.sql.catalog.hive_prod.type=hive   --conf 
spark.sql.catalog.hive_prod.warehouse=hdfs://xxxx/hivewarehouse/iceberg --jars 
iceberg-hive-runtime-1.2.1.jar,iceberg-spark-runtime-3.2_2.12-1.2.1.jar
   
   > insert into my_test_table values ('a1','b1','c1');
   ```
   
   The exception is like this:
   ```
   org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
           at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:274)
           at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.<init>(HiveFileFormat.scala:132)
           at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:105)
           at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161)
           at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:146)
           at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:290)
           at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:131)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.NullPointerException
           at 
org.apache.iceberg.mr.hive.TezUtil$TaskAttemptWrapper.<init>(TezUtil.java:105)
           at 
org.apache.iceberg.mr.hive.TezUtil.taskAttemptWrapper(TezUtil.java:78)
           at 
org.apache.iceberg.mr.hive.HiveIcebergOutputFormat.writer(HiveIcebergOutputFormat.java:73)
           at 
org.apache.iceberg.mr.hive.HiveIcebergOutputFormat.getHiveRecordWriter(HiveIcebergOutputFormat.java:58)
           at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
           at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
           ... 14 more
   ```
   
   Does anyone had this problem before? Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ideal opened a new issue, #7739: NullPointerException at org.apache.iceberg.mr.hive.TezUtil$TaskAttemptWrapper when INSERT with spark-sql

Reply via email to