jornfranke opened a new issue, #5970:
URL: https://github.com/apache/iceberg/issues/5970

   ### Apache Iceberg version
   
   0.14.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   This happens only since ca. v 0.14.0. It does not happen with v0.13.1 . I 
deleted the tables before testing each version to make it exactly reproducible.
   I execute the following code:
   ```
   from pyspark.sql import SparkSession
   iceberg_version="0.14.1"
   spark_version="3.2"
   scala_version="2.12"
   spark = (
               SparkSession.builder.appName("Spark-Test-App")
               
.config("spark.sql.extensions","org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
               
.config("spark.sql.catalog.spark_catalog","org.apache.iceberg.spark.SparkSessionCatalog")
               .config("spark.sql.catalog.spark_catalog.type","hive")
               
.config("spark.jars",f"https://repo1.maven.org/maven2/maven-remote/org/apache/iceberg/iceberg-spark-runtime-{spark_version}_{scala_version}/{iceberg_version}/iceberg-spark-runtime-{spark_version}_{scala_version}-{iceberg_version}.jar";)
               .getOrCreate()
           )
   iceberg_table = "default.spark_iceberg_test_table"
   spark.sql(f"CREATE TABLE {iceberg_table} (id bigint, data string) USING 
iceberg")
   spark.sql(f"INSERT INTO {iceberg_table} VALUES (1, 'a'), (2, 'b'), (3, 'c')")
   ```
   
   I get the following error message:
   ```
   22/10/12 13:50:07 777 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 
times; aborting job
   22/10/12 13:50:07 798 ERROR AppendDataExec: Data source write support 
IcebergBatchWrite(table=spark_catalog.default.spark_iceberg_test_table, 
format=PARQUET) is aborting.
   22/10/12 13:50:07 804 ERROR AppendDataExec: Data source write support 
IcebergBatchWrite(table=spark_catalog.default.spark_iceberg_test_table, 
format=PARQUET) aborted.
   ---------------------------------------------------------------------------
   Py4JJavaError                             Traceback (most recent call last)
   /tmp/ipykernel_1593/917831532.py in <module>
   ----> 1 spark.sql(f"INSERT INTO {iceberg_table} VALUES (1, 'a'), (2, 'b'), 
(3, 'c')")
   /opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py in sql(self, 
sqlQuery)
       721         [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3, 
f2='row3')]
       722         """
   --> 723         return DataFrame(self._jsparkSession.sql(sqlQuery), 
self._wrapped)
       724 
       725     def table(self, tableName):
   /usr/local/lib/python3.9/site-packages/py4j/java_gateway.py in 
__call__(self, *args)
      1302 
      1303         answer = self.gateway_client.send_command(command)
   -> 1304         return_value = get_return_value(
      1305             answer, self.gateway_client, self.target_id, self.name)
      1306 
   /opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
       109     def deco(*a, **kw):
       110         try:
   --> 111             return f(*a, **kw)
       112         except py4j.protocol.Py4JJavaError as e:
       113             converted = convert_exception(e.java_exception)
   /usr/local/lib/python3.9/site-packages/py4j/protocol.py in 
get_return_value(answer, gateway_client, target_id, name)
       324             value = OUTPUT_CONVERTER[type](answer[2:], 
gateway_client)
       325             if answer[1] == REFERENCE_TYPE:
   --> 326                 raise Py4JJavaError(
       327                     "An error occurred while calling {0}{1}{2}.\n".
       328                     format(target_id, ".", name), value)
   Py4JJavaError: An error occurred while calling o59.sql.
   : org.apache.spark.SparkException: Writing job aborted
       at 
org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:613)
       at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:389)
       at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:333)
       at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:236)
       at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:312)
       at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:311)
       at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:236)
       at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
       at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
       at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:111)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:111)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
       at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
       at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
       at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
       at py4j.Gateway.invoke(Gateway.java:282)
       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
       at py4j.commands.CallCommand.execute(CallCommand.java:79)
       at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
       at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 0.0 (TID 6) (100.100.217.57 executor 1): 
java.io.InvalidClassException: org.apache.iceberg.Schema; local class 
incompatible: stream classdesc serialVersionUID = 3320367012418887609, local 
class serialVersionUID = -8857144469361102787
       at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
       at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
       at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
       at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
       at org.apache.spark.scheduler.Task.run(Task.scala:131)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Driver stacktrace:
       at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351)
       at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
       at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
       at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109)
       at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109)
       at scala.Option.foreach(Option.scala:407)
       at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214)
       at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:357)
       ... 47 more
   Caused by: java.io.InvalidClassException: org.apache.iceberg.Schema; local 
class incompatible: stream classdesc serialVersionUID = 3320367012418887609, 
local class serialVersionUID = -8857144469361102787
       at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
       at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
       at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
       at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
       at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
       at org.apache.spark.scheduler.Task.run(Task.scala:131)
       at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       ... 1 more
   ```
   
   As said, in 0.13.1 the exact same code works without any issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to