[I] why Iceberg Spark add_files procedure does not support parquet file with parquet-mr 1.10 version? [iceberg]

via GitHub Fri, 05 Jan 2024 01:50:19 -0800


hbpeng0115 opened a new issue, #9418:
URL: https://github.com/apache/iceberg/issues/9418


   ### Query engine
   
   Spark
   
   ### Question
   
   Hello, we use Spark 3.2 + Iceberg 1.4.3 call procedure to add S3 parquet 
files to existed Iceberg table. However, we find parquet files with parquet-mr 
1.10 version can not be queried after invoking add_files procedure and throw 
the following exception. And parquet files with parquet-mr 1.13 version can be 
queried. Does anyone know how to solve this issue? Many thanks~
   ```
   Caused by: 
org.apache.iceberg.shaded.org.apache.parquet.io.ParquetDecodingException: Can't 
read value in column [request, context, profile_concrete_event_id, array] 
repeated int64 array = 170 at value 76034 out of 184067 in current page. 
repetition level: 1, definition level: 3
        at 
org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220)
        at 
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:151)
        at 
org.apache.iceberg.parquet.ColumnIterator.nextLong(ColumnIterator.java:128)
        at 
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:49)
        at 
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:46)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
        at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
        at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130)
        at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65)
        at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
        at 
org.apache.iceberg.shaded.org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
        at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
        at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
        at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:117)
        at 
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:149)
        ... 33 more```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] why Iceberg Spark add_files procedure does not support parquet file with parquet-mr 1.10 version? [iceberg]

Reply via email to