oku95 commented on issue #8655: URL: https://github.com/apache/iceberg/issues/8655#issuecomment-2096066510
Hi @manuzhang Getting similar error in AWS Glue 4.0 Spark env ``` 24/05/06 00:49:40 ERROR Executor: Exception in task 1.0 in stage 11.0 (TID 20) java.lang.IllegalStateException: Value at index is null at org.apache.iceberg.shaded.org.apache.arrow.vector.BigIntVector.get(BigIntVector.java:112) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?] at org.apache.iceberg.arrow.vectorized.GenericArrowVectorAccessorFactory$LongAccessor.getLong(GenericArrowVectorAccessorFactory.java:257) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?] at org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getLong(IcebergArrowColumnVector.java:101) ~[iceberg-spark-runtime-3.3_2.12-1.0.0.jar:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hasNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:968) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:314) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$21(FileFormatWriter.scala:257) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.scheduler.Task.run(Task.scala:138) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.0-amzn-1.jar:?] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) ~[spark-core_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.0-amzn-1.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] ``` Tried to disabling it with `arrow.enable_null_check_for_get=false` but got issue with ``` --conf spark.executor.extraJavaOptions=-Darrow.enable_null_check_for_get=false --enable-continuous-cloudwatch-log true --scriptLocation s3://stage-pipeline-glue-assets-493140057280/scripts/load_source_iceberg.py --job-language python --JOB_NAME stage-pipeline-load-source-iceberg -- openjdk version "1.8.0_402" OpenJDK Runtime Environment Corretto-8.402.08.1 (build 1.8.0_402-b08 ) OpenJDK 64-Bit Server VM Corretto-8.402.08.1 (build 25.402-b08, mixed mode) 1715002240878 LAUNCH ERROR \| Invalid input to --confPlease refer logs for details. Exception in thread "main" java.lang.IllegalArgumentException: Invalid input to --conf at com.amazonaws.services.glue.ArgsParserForSparkProperties.$anonfun$parse$2(ConfigParam.scala:458) at com.amazonaws.services.glue.ArgsParserForSparkProperties.$anonfun$parse$2$adapted(ConfigParam.scala:445) at scala.collection.immutable.Range.foreach(Range.scala:158) at com.amazonaws.services.glue.ArgsParserForSparkProperties.parse(ConfigParam.scala:445) at com.amazonaws.services.glue.PrepareLaunchProperties.<init>(PrepareLaunch.scala:222) at com.amazonaws.services.glue.PrepareLaunch.<init>(PrepareLaunch.scala:528) at com.amazonaws.services.glue.PrepareLaunch.<init>(PrepareLaunch.scala:525) at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:54) at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala) ``` Any ideas what might cause it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org