palanik1 opened a new issue, #8419:
URL: https://github.com/apache/iceberg/issues/8419

   ### Query engine
   
   Setup:
   Spark: 3.3.3
   Scala: 2.12.15
   
   sparkConf=(SparkConf().set("spark.jars.packages", 
"org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1,software.amazon.awssdk:bundle:2.20.18,software.amazon.awssdk:url-connection-client:2.20.18")
           
           .set("spark.sql.catalog.iceberg",             
"org.apache.iceberg.spark.SparkCatalog")
           .set("spark.sql.extensions",                  
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
           .set("spark.sql.catalog.iceberg.io-impl",     
"org.apache.iceberg.aws.s3.S3FileIO")
           .set("spark.sql.defaultCatalog",              "iceberg")        
           .set("spark.sql.catalog.iceberg.type",        "rest")
           )
     spark = SparkSession.builder.config(conf=conf).getOrCreate()
   
   ### Question
   
   I have a simple pyspark program that uses spark.sql to create a table and 
insert into it
   ```
   df = spark.sql("CREATE TABLE iceberg.db.sample_table (id bigint, data 
string) USING iceberg")
   df = spark.sql("INSERT INTO TABLE iceberg.db.sample_table VALUES (1,'a')")
   ```
   
   I get a java.lang.ClassNotFoundException: 
org.apache.iceberg.spark.source.SparkWrite$WriterFactory error as follows, 
although I have org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1 
included in the spark conf.
   ```
   23/08/29 18:39:26 WARN TaskSetManager: Lost task 0.0 in stage 6.0 (TID 5) 
(10.168.113.20 executor 0): java.lang.ClassNotFoundException: 
org.apache.iceberg.spark.source.SparkWrite$WriterFactory
           at 
java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
           at java.base/java.lang.Class.forName0(Native Method)
           at java.base/java.lang.Class.forName(Class.java:398)
           at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71)
           at 
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
           at 
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2201)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2134)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1675)
           at 
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496)
           at 
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496)
           at 
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489)
           at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447)
           at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
           at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   
   23/08/29 18:39:26 ERROR TaskSetManager: Task 0 in stage 6.0 failed 4 times; 
aborting job
   23/08/29 18:39:26 ERROR AppendDataExec: Data source write support 
IcebergBatchWrite(table=iceberg.openshift.sample_table1, format=PARQUET) is 
aborting.
   23/08/29 18:39:26 ERROR AppendDataExec: Data source write support 
IcebergBatchWrite(table=iceberg.openshift.sample_table1, format=PARQUET) 
aborted.
   Traceback (most recent call last):
     File "~/examples/spark-example.py", line 93, in <module>
       df = spark.sql("INSERT INTO TABLE iceberg.db.sample_table1 VALUES 
(1,'a')")
     File "/root/spark/spark-3.3.3-bin-hadoop3/python/pyspark/sql/session.py", 
line 1034, in sql
       return DataFrame(self._jsparkSession.sql(sqlQuery), self)
     File 
"/root/spark/spark-3.3.3-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
 line 1321, in __call__
     File "/root/spark/spark-3.3.3-bin-hadoop3/python/pyspark/sql/utils.py", 
line 190, in deco
       return f(*a, **kw)
     File 
"/root/spark/spark-3.3.3-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py",
 line 326, in get_return_value
   
           at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:332)
           at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:331)
           at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:244)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
           at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
           at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
           at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
           at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
           at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
           at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
           at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invo
           at java.base/java.lang.Class.forName(Class.java:398)
           at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71)
           at 
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
           at 
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1870)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2201)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2134)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1675)
           at 
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496)
           at 
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2496)
           at 
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2390)
           at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2228)
           at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1687)
           at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:489)
           at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:447)
           at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
           at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   
   Driver stacktrace:
           at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2668)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2604)
           at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2603)
           at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedule
   ```
   
   Any thoughts on what I might be missing. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to