hbpeng0115 opened a new issue, #9418: URL: https://github.com/apache/iceberg/issues/9418
### Query engine Spark ### Question Hello, we use Spark 3.2 + Iceberg 1.4.3 call procedure to add S3 parquet files to existed Iceberg table. However, we find parquet files with parquet-mr 1.10 version can not be queried after invoking add_files procedure and throw the following exception. And parquet files with parquet-mr 1.13 version can be queried. Does anyone know how to solve this issue? Many thanks~ ``` Caused by: org.apache.iceberg.shaded.org.apache.parquet.io.ParquetDecodingException: Can't read value in column [request, context, profile_concrete_event_id, array] repeated int64 array = 170 at value 76034 out of 184067 in current page. repetition level: 1, definition level: 3 at org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220) at org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:151) at org.apache.iceberg.parquet.ColumnIterator.nextLong(ColumnIterator.java:128) at org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:49) at org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:46) at org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246) at org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467) at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at org.apache.iceberg.shaded.org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) at org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) at org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) at org.apache.iceberg.shaded.org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:117) at org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:149) ... 33 more``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org