Re: [I] Spark SQL UI can't show scan metrics. [iceberg]

via GitHub Tue, 07 Jan 2025 03:04:52 -0800

jotarada commented on issue #11191:
URL: https://github.com/apache/iceberg/issues/11191#issuecomment-2575005020

   I'm testing migrate some workloads from spark 3.3.1 and iceberg 1.4.3 to 
spark 3.5.1 and iceberg 1.4.3 and got a even worse problem.
   ```Caused by: java.util.concurrent.ExecutionException: 
java.util.NoSuchElementException: key not found: totalDataFileSize```
   
   Can this be related?
   
   **Full stack trace:** 
   ```Client: Application diagnostics message: User class threw exception: 
java.util.NoSuchElementException: key not found: totalDataFileSize
        at scala.collection.MapLike.default(MapLike.scala:236)
        at scala.collection.MapLike.default$(MapLike.scala:235)
        at scala.collection.AbstractMap.default(Map.scala:65)
        at scala.collection.MapLike.apply(MapLike.scala:144)
        at scala.collection.MapLike.apply$(MapLike.scala:143)
        at scala.collection.AbstractMap.apply(Map.scala:65)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.doExecuteColumnar(DataSourceV2ScanExecBase.scala:220)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.doExecuteColumnar$(DataSourceV2ScanExecBase.scala:214)
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.doExecuteColumnar(BatchScanExec.scala:36)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
        at 
org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
        at 
org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
        at 
org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218)
        at 
org.apache.spark.sql.execution.ColumnarToRowExec.inputRDDs(Columnar.scala:204)
        at 
org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.inputRDDs(AggregateCodegenSupport.scala:89)
        at 
org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.inputRDDs$(AggregateCodegenSupport.scala:88)
        at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:49)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:751)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:141)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:141)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:146)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:145)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$submitShuffleJob$1(ShuffleExchangeExec.scala:73)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob(ShuffleExchangeExec.scala:73)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.submitShuffleJob$(ShuffleExchangeExec.scala:72)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.submitShuffleJob(ShuffleExchangeExec.scala:120)
        at 
org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture$lzycompute(QueryStageExec.scala:188)
        at 
org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture(QueryStageExec.scala:188)
        at 
org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.doMaterialize(QueryStageExec.scala:190)
        at 
org.apache.spark.sql.execution.adaptive.QueryStageExec.materialize(QueryStageExec.scala:62)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5(AdaptiveSparkPlanExec.scala:303)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5$adapted(AdaptiveSparkPlanExec.scala:301)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:301)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:273)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:418)
        at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeTake(AdaptiveSparkPlanExec.scala:395)
        at org.apache.spark.sql.Dataset.$anonfun$isEmpty$1(Dataset.scala:654)
        at 
org.apache.spark.sql.Dataset.$anonfun$isEmpty$1$adapted(Dataset.scala:653)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4322)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:547)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4320)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4320)
        at org.apache.spark.sql.Dataset.isEmpty(Dataset.scala:653)
   at 
com.pipeline.data.IcebergPipeline$IcebergPipeline.handleDuplicates(IcebergPipeline.scala:253)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Spark SQL UI can't show scan metrics. [iceberg]

Reply via email to