hbpeng0115 opened a new issue, #9418:
URL: https://github.com/apache/iceberg/issues/9418
### Query engine
Spark
### Question
Hello, we use Spark 3.2 + Iceberg 1.4.3 call procedure to add S3 parquet
files to existed Iceberg table. However, we find parquet files with parquet-mr
1.10 version can not be queried after invoking add_files procedure and throw
the following exception. And parquet files with parquet-mr 1.13 version can be
queried. Does anyone know how to solve this issue? Many thanks~
```
Caused by:
org.apache.iceberg.shaded.org.apache.parquet.io.ParquetDecodingException: Can't
read value in column [request, context, profile_concrete_event_id, array]
repeated int64 array = 170 at value 76034 out of 184067 in current page.
repetition level: 1, definition level: 3
at
org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220)
at
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:151)
at
org.apache.iceberg.parquet.ColumnIterator.nextLong(ColumnIterator.java:128)
at
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:49)
at
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:46)
at
org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246)
at
org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467)
at
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
at
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
at
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130)
at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65)
at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129)
at
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93)
at
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking
stream.
at
org.apache.iceberg.shaded.org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
at
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
at
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
at
org.apache.iceberg.shaded.org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:117)
at
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:149)
... 33 more```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]