rjayapalan opened a new issue, #9689:
URL: https://github.com/apache/iceberg/issues/9689

   ### Apache Iceberg version
   
   1.4.2
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I am aware of this similar issue that was addressed as part of iceberg 1.4.1 
release
   https://github.com/apache/iceberg/pull/8834
   
   But this one seems different to me. This error comes up after performing a 
DDL change on the existing iceberg table (adding a new column) and then 
performing "rewrite_manifest" maintenance operation using Spark stored 
procedure.
   
   I was able to recreate the issue following this pattern (ALTER TABLE... -> 
rewrite_manifest). Not sure what is causing this or is this a bug in the first 
place?
   
   Error stacktrace:
   
   `An error was encountered:
   An error occurred while calling o431.parquet.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 
in stage 432.0 failed 4 times, most recent failure: Lost task 3.3 in stage 
432.0 (TID 23496) ([2600:1f18:610f:a400:cea3:e3ca:45b8:9398] executor 409): 
org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing 
rows to s3://cs-dataeng-staging/rjayapalan/tmp/crm_dev_unload.
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:789)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:421)
        at 
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.IllegalArgumentException: requirement failed: length 
(-6235972) cannot be smaller than -1
        at scala.Predef$.require(Predef.scala:281)
        at 
org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79)
        at 
org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala)
        at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:93)
        at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
        at scala.Option.exists(Option.scala:376)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1575)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411)
        ... 15 more
   `
   
   Environment : EMR 6.15 || Spark 3.4 || Iceberg 1.4.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to