gaoshihang opened a new issue, #9497: URL: https://github.com/apache/iceberg/issues/9497
### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 We have a two-level parquet list, the schema like below:  Now if this array is an empty array: [], when we using add_files function add this parquet to a table, then query will throw this exception: ``` Caused by: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [a, array] repeated int32 array = 2 at value 4 out of 4 in current page. repetition level: -1, definition level: -1 at org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220) at org.apache.iceberg.parquet.PageIterator.nextInteger(PageIterator.java:141) at org.apache.iceberg.parquet.ColumnIterator.nextInteger(ColumnIterator.java:121) at org.apache.iceberg.parquet.ColumnIterator$2.next(ColumnIterator.java:41) at org.apache.iceberg.parquet.ColumnIterator$2.next(ColumnIterator.java:38) at org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246) at org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467) at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.parquet.io.ParquetDecodingException: could not read int at org.apache.parquet.column.values.plain.PlainValuesReader$IntegerPlainValuesReader.readInteger(PlainValuesReader.java:114) at org.apache.iceberg.parquet.PageIterator.nextInteger(PageIterator.java:139) ... 32 more Caused by: java.io.EOFException at org.apache.parquet.bytes.SingleBufferInputStream.read(SingleBufferInputStream.java:52) at org.apache.parquet.bytes.LittleEndianDataInputStream.readInt(LittleEndianDataInputStream.java:347) at org.apache.parquet.column.values.plain.PlainValuesReader$IntegerPlainValuesReader.readInteger(PlainValuesReader.java:112) ... 33 more ``` And I read the code in Iceberg-parquet, it seems like this do-while will never exist:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org